fix(vortex-bench): map gs:// scheme to gcs storage label#8630
Conversation
url_scheme_to_storage only handled s3 and file, so benchmark runs against GCS (gs://) failed during setup with "unknown URL scheme: gs" before any query ran. Add a STORAGE_GCS constant and a gs arm. make_object_store already handles gs:// for the actual reads. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
168.9 µs | 205.7 µs | -17.87% |
| ❌ | Simulation | slice_empty_vortex |
339.4 ns | 397.8 ns | -14.66% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
259.5 µs | 224.4 µs | +15.64% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.6 µs | 271.5 µs | +12.95% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing polarsignals:fix/vortex-bench-gcs-storage-scheme (e72acdc) with develop (5d3be01)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Rationale for this change
Running the benchmark harness (
datafusion-bench/query_bench) against a remotedataset on Google Cloud Storage (
--opt remote-data-dir=gs://…) fails immediatelyduring benchmark setup with:
vortex-bench'surl_scheme_to_storagehelper — which maps a data-dir URL scheme to astoragelabel used for result reporting — only handleds3andfile, so anygs://run bailed before a single query executed. S3 remote runs work because
s3is handled;GCS was simply never covered.
make_object_storealready supportsgs://for the actualreads, so the only gap was this reporting helper.
What changes are included in this PR?
STORAGE_GCS = "gcs"constant."gs"arm tourl_scheme_to_storagereturning that label.Verified by running TPC-H SF1 from a GCS bucket end-to-end (DataFusion + Vortex, 22/22
queries executing against
gs://…, results taggedstorage=gcs).What APIs are changed? Are there any user-facing changes?
None. This only affects the benchmark harness's storage-label reporting; no public API,
format, or behavior change outside
vortex-bench.🤖 Generated with Claude Code