Closed
Description
This is an attempt to organize myself and make what I plan to work on more visible
Weekly High Level Goals (in order)
- Arrow release: Release arrow-rs / parquet major version
55.0.0
(Apr 2025) arrow-rs#7084 - Start working on testing the 47.0.0 release: Release DataFusion
47.0.0
(April 2025) #15072 - TopK Pushdown Dynamic pruning filters from TopK state (optimize
ORDER BY LIMIT
queries) #15037 with @adriangb - Avoid resorting/merging [EPIC] Avoid sort for already sorted Parquet files that do not overlap values on condition #6672 with @wiedld @suremarc and @xudong963
- Get Enable parquet filter pushdown by default #3463 ready for merge with @XiangpengHao
- Work on integrating tpch data generator with @clflushopt : Make it easier to run TPCH queries with datafusion-cli #14608
Other projects I plan to review
- Bug fixes
- Performance improvements
- Complete insta test migration [Epic] Add snapshot tests (migrate to
insta
for tests) #15178 with @blaginin @shruti2522 @qstommyshu and others - Hardening external sorts: A complete solution for stable and safe sort with spill #14692 with @2010YOUY01
- Set up Spark function library pattern: feat: Add
datafusion-spark
crate #15168 with @shehabgamin and @andygrove - Use UTF8 view by default Change mapping of SQL
VARCHAR
fromUtf8
toUtf8View
#15096 with @zhuqi-lucas
Background
I am putting this list on github because:
- I like how github renders checklists w/ PR titles so it is easy to track (I currently have a local text file...)
- I thought others might be interested from seeing what I am doing / planning to do
- It makes me feel better that I don't have time to review all the PRs 😭
The way I am trying to prioritize PRs is in the following order
- Bug fixes
- Documentation / UX / API improvements (things that make DataFusion easier/better to work with)
- Performance improvements
- New features with wide appeal
- New functions
Note new features and functions are deliberately at the bottom
Metadata
Metadata
Assignees
Labels
No labels