Closed
Description
This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized
- Parquet Filter Pushdown Performance: Complete ClickBench benchmark: Add
arrow_reader_clickbench
benchmark arrow-rs#7470 - Parquet Filter Pushdown Performance: Refactor selection logic into its own structure: Introduce
ReadPlan
to encapsulate the calculation of what parquet rows to decode arrow-rs#7502 - Parquet Filter Pushdown Performance: create POC to save filtered results: POC: Sketch out parquet cached filter result API arrow-rs#7513
- Parquet Filter Pushdown Performance: review additional code from ClickBench derived benchmarks with @zhuqi-lucas
- DataFusion performance: review per-file pruning with @adriangb Add late pruning of Parquet files based on file level statistics #16014
- DataFusion performance: review blocked aggregate PR from @Rachelint : Intermediate result blocked approach to aggregation memory management #15591 (comment)
- DataFusion performance: Projection Pushdown: review suggestion Support Push down expression evaluation in
TableProviders
#14993 (comment) from @adragomir - Parquet Variant: Create
parquet-variant
create skeleton PR : [Variant] Add (empty)parquet-variant
crate, updateparquet-testing
pin arrow-rs#7485 - Parquet Variant: review @PinkCrow007 's Add API for Creating Variant Values arrow-rs#7452 from @PinkCrow007
- Parquet Variant: get variant encoder/decoder into
parquet-variant
crate with @PinkCrow007 - Parquet Variant: Try and fix variant example files with @mapleFU primitive_int64.value maybe an int32 type parquet-testing#82
- DataFusion: Metadata Handling / extension types review @timsaucer: feat: metadata handling for aggregates and window functions #15911
- DataFusion Feature: Update example of using multiple threadpools with object store Example for using a separate threadpool for CPU bound work (try 2) #14286 (comment)
- DataFusion Feature: async user defined functions with @goldmedal Introduce Async User Defined Functions #14837
- Arrow Bug with concat'ing dictionaries from @davidhewitt: support merging primitive dictionaries in interleave and concat arrow-rs#7468
- DataFusion perf script from @logan-keede : Shell script to collect benchmarks for multiple versions #15144
- DataFusion perf script draft: feat(benchmark): collect benchmarks for last 5 versions in line protocol format #15846
- DataFusion PR about pruning ordering: pipe column orderings into pruning predicate creation #15821
- DataFusion Sorting: test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory #15727
- Arrow Dictionary ID next steps from @brancz: Remove dict_id from schema and make it an IPC concern only arrow-rs#7467
- Arrow Variant: Expose tape decoder Add custom decoder in arrow-json arrow-rs#7442
Nice to have (really would be great to have someone help review):
- DataFusion: Aggregate UDFs in FFI: feat: Add Aggregate UDF to FFI crate #14775
- Arrow: Avro cleanup: Avro codec enhancements arrow-rs#6965
- Arrow: Avro Utf8View: Support Utf8View for Avro arrow-rs#7434
Metadata
Metadata
Assignees
Labels
No labels