[EPIC] Improved performance in H2O.ai benchmarks

### Is your feature request related to a problem or challenge?

The basic aggregate functions like `COUNT` and `SUM` in DataFusion are *very* fast (see [Apache DataFusion is now the fastest single node engine for querying Apache Parquet files](https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/))

However, many of the other aggregate functions are not particularly fast, and this shows up specifically on some of the H20 benchmarks

We saw this in the results in the [2024 DataFusion SIGMOD paper](https://dl.acm.org/doi/10.1145/3626246.3653368)
![Screenshot 2024-11-24 at 8 34 35 AM](https://github.com/user-attachments/assets/72338cd9-3b1d-4feb-ae65-29b9c53ac3da)

(BTW we have made median faster)

@MrPowers has also [observed similar results on discord (link)](https://discord.com/channels/885562378132000778/1309883046886903870/1309887744595595324):
> DataFusion was added to the h2o benchmarks (which are now maintained by duckdb) and DataFusion performs quite well for most of the "basic" groupby queries.  It performs poorly for some of the advanced questions on the 50GB dataset.  Here are the results: 
> https://duckdblabs.github.io/db-benchmark/

See his version of the benchmarks here
https://github.com/MrPowers/mrpowers-benchmarks

### Testing
- [x] https://github.com/apache/datafusion/issues/7209

### Functions
- [x] https://github.com/apache/datafusion/issues/13549
- [x] #13550

## Other improvements
- [ ] https://github.com/apache/datafusion/issues/13765

### Describe the solution you'd like

DataFusion has two APIs ways to implement Aggregate functions like `SUM` and `COUNT`
- Easy (but slow) way: `Accumulator` ([api docs](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.Accumulator.html))
- Fast (but complicated way): `GroupsAccumulator` ([api docs](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html))

The basic aggregates are implemented using `GroupsAccumulator` and are part of DataFusions  performance 

This ticket tracks the effort to improve the performance of these  for these "more advanced" aggregate functions, likely by implementing `GroupsAccumulator`



### Describe alternatives you've considered

For each function listed above, ideally we would:
1. Add a new benchmark. Either add a specific one for H20 benchmarks or add a query to the ClickBench extended benchmark [Documentation Here](https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries) in one PR
2. Implement `GroupsAccumulator` for the relevant aggregate function in a second PR (along with tests for correctness). We would use the benchmark to verify the performance


Here is a pretty good example of how @eejbyfeldt  did this for `STDDEV`:
- https://github.com/apache/datafusion/pull/12095

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[EPIC] Improved performance in H2O.ai benchmarks #13548

Is your feature request related to a problem or challenge?

Testing

Functions

Other improvements

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC] Improved performance in H2O.ai benchmarks #13548

Description

Is your feature request related to a problem or challenge?

Testing

Functions

Other improvements

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions