Closed
Description
This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized
- DataFusion: review spark functions feat: Add
datafusion-spark
crate #15168 from @shehabgamin - DataFusion review partition statistics PR from @xudong963 , there is a newer PR about statistics API:
Feat: introduceExecutionPlan::partition_statistics
API #15852 - DataFusion / statistics: Map file-level column statistics to the table-level #15865 from @xudong963
- DataFusion Bug: [DISCUSSION] Sorts being removed from subqueries #15886
- arrow: file ticket about boolean based row selection: [Parquet] Add BooleanArray based row selection arrow-rs#6624 -- and discord discussion; https://discord.com/channels/885562378132000778/1363995762182193373/1366410521066078349
- arrow filter pushdown: review existing PRs and file organizational tickets
- arrow filter pushdown find benchmark discrepancy: arrow_reader_row_filter benchmark doesn't capture page cache improvements arrow-rs#7460
- sqlparser -- prepare release: Release sqlparser-rs version
0.56.0
around 2024-04-20 datafusion-sqlparser-rs#1756 - DataFusion: Spark Merge feat: Add
datafusion-spark
crate #15168 and file follow on organizational epic - DataFusion: review filter pushdown APIs: refactor filter pushdown apis #15801
- DataFusion Dynamic Filter pushdown: Implement Parquet filter pushdown via new filter pushdown APIs #15769
- Arrow Variant: Apply feedback to Add example binary variant data and regeneration scripts parquet-testing#76
- Arrow Variant: Review Creation API: Add API for Creating Variant Values arrow-rs#7452 from @PinkCrow007
- DataFusion: aggregate performance PR from @Rachelint Intermediate result blocked approach to aggregation memory management #15591
- Arrow filter pushdown: bitmap / range: Poc for adaptive parquet predicate pushdown(bitmap/range) with page cache(3 data pages) arrow-rs#7454
- object_store: fix/merge thread pool PR feat: Add
SpawnService
andSpawnedReqwestConnector
for running requests on a different runtime arrow-rs-object-store#332 - Arrow Variant: Create
parquet-variant
create skeleton PR and basic reader API - Arrow Variant: Expose tape decoder Add custom decoder in arrow-json arrow-rs#7442
- DataFusion perf script from @logan-keede : Shell script to collect benchmarks for multiple versions #15144
- DataFusion perf script draft: feat(benchmark): collect benchmarks for last 5 versions in line protocol format #15846
- DataFusion: Min/Max for lists/ nested types: feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max #15857
- DataFusion PR about pruning ordering: pipe column orderings into pruning predicate creation #15821
- DataFusion Sorting: test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory #15727
Nice to have (really would be great to have someone help review):
- DataFusion: Aggregate UDFs in FFI: feat: Add Aggregate UDF to FFI crate #14775
- Arrow: Avro cleanup: Avro codec enhancements arrow-rs#6965
- Arrow: Avro Utf8View: Support Utf8View for Avro arrow-rs#7434
Metadata
Metadata
Assignees
Labels
No labels