Skip to content

Split clickbench query set into one file per query #16476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pepijnve
Copy link
Contributor

Which issue does this PR close?

None

Rationale for this change

Clickbench query IDs are zero-based while most editors are one-based wrt line numbers. This causes a little bit of friction every time you want to check the query for a particular clickbench run.

There's precedent in the benchmark suite already for having one file per query rather than one file with a query per line. By having distinct files, the query id can be reflected in the filename making lookup trivial.

What changes are included in this PR?

  • Add a script to download the upstream queries.sql file from the clickbench repo and split it into one file per query
  • Adapt the clickbench benchmark code to read queries from individual files
  • Adjust parameters in bench.sh
  • Adapt the sql_planner benchmark code to read queries from individual files

Are these changes tested?

Manually tested

Are there any user-facing changes?

No

@github-actions github-actions bot added the core Core DataFusion crate label Jun 20, 2025
@pepijnve pepijnve force-pushed the clickbench_split branch 2 times, most recently from 672e22b to 3312d94 Compare June 20, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant