Description
Is your feature request related to a problem or challenge?
As DataFusion becomes more mature and gets more features and tests, the amount of time it takes to build and test the system has been increasing
This has several downsides:
Barrier to new contributors is higher
The resources required to build / link DataFusion are now very large which means some people may not be able to run them. For example @Alexhuszagh reports on #13693 (comment)
I only tested this locally with the failing datasource::physical_plan::json::tests::test_chunked_json tests, but it should be the same with all the remaining errors. My test machine doesn't have enough RAM to link the entire core test suite,
We are likely wasting lots of resources running CI tests
By my count the CI checks on the most recent checkin (link) require over 200 hours of runner time
We are lucky the ASF gets many runner credits from github, but this level is very wasteful in my opinion and likely unsustainable as we contemplate adding additional testing such as
Larger binary size
I have noticed that the datafusion-cli binary is now almost 100MB it used to be 60MB
Also, people such as @g3blv have noticed that the WASM build has increased 50%:
#9834 (comment)
See
Decreased developer productivity due to longer cycle times
As DataFusion gets bigger, I have noticed that recompiling datafusion for me takes longer and longer
cargo test ...
Describe the solution you'd like
I would like to make the building and testing of DataFusion "easier" and leaner. This ticket tracks work to improve things in this area
Describe alternatives you've considered
- CI: Windows flow takes 1.5h #13726
- Building project takes a *long* time (esp compilation time for
datafusion
core crate) #13814 - Datafusion binary size has been getting bigger #13816
- Running tests uses 50.1GB of disk space on Ubuntu #11105
- Contemplate stop CI testing on intel mac #13846
- Improve efficiency of CI checks (so we can add MORE!) #13845
Additional context
No response