Refactor DictEncoder to not use ArrayAccessors#7759
Conversation
Merging this PR will improve performance by 12.63%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
169.8 µs | 205.9 µs | -17.56% |
| ❌ | Simulation | chunked_varbinview_opt_into_canonical[(1000, 10)] |
183.3 µs | 219.1 µs | -16.36% |
| ❌ | Simulation | slice_empty_vortex |
310 ns | 368.3 ns | -15.84% |
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
285 µs | 145.7 µs | +95.63% |
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
26.7 µs | 16.1 µs | +65.65% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.8 µs | 270.6 µs | +13.41% |
| ⚡ | Simulation | encode_primitives[u8, (10000, 512)] |
340.2 µs | 302.2 µs | +12.56% |
| ⚡ | Simulation | encode_primitives[u8, (10000, 4)] |
322.1 µs | 287.2 µs | +12.15% |
| ⚡ | Simulation | encode_primitives[u8, (10000, 2)] |
322 µs | 287.2 µs | +12.13% |
| ⚡ | Simulation | encode_primitives[u8, (10000, 32)] |
325 µs | 290.2 µs | +12.01% |
| ⚡ | Simulation | encode_primitives[u8, (10000, 8)] |
322.7 µs | 288.2 µs | +11.98% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 244.4 ns | +11.93% |
| ⚡ | Simulation | eq_i64_constant |
319.1 µs | 288.2 µs | +10.69% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing rk/cardinality-estimator (0e10063) with develop (9382028)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Polar Signals Profiling ResultsLatest Run
Previous Runs (10)
Powered by Polar Signals Cloud |
Benchmarks: CompressionVortex (geomean): 0.993x ➖ How to read Verdict and Engines
unknown / unknown (0.998x ➖, 4↑ 4↓)
|
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.991x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (0.991x ➖, 1↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.013x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.989x ➖, 0↑ 0↓)
datafusion / parquet (1.000x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.984x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.006x ➖, 0↑ 0↓)
duckdb / parquet (1.016x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.991x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.994x ➖, 1↑ 2↓)
datafusion / parquet (1.000x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.993x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.997x ➖, 0↑ 2↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
duckdb / duckdb (1.001x ➖, 2↑ 2↓)
File Size Changes (18 files changed, -0.0% overall, 12↑ 6↓)
Totals:
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.990x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.996x ➖, 0↑ 0↓)
duckdb / parquet (1.007x ➖, 0↑ 0↓)
File Size Changes (2 files changed, +0.0% overall, 1↑ 1↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.408x ❌, 0↑ 5↓)
datafusion / vortex-compact (1.281x ➖, 0↑ 3↓)
datafusion / parquet (1.384x ❌, 0↑ 8↓)
duckdb / vortex-file-compressed (1.042x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.038x ➖, 0↑ 0↓)
duckdb / parquet (1.096x ➖, 0↑ 0↓)
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.044x ➖, 2↑ 7↓)
datafusion / parquet (1.041x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (1.045x ➖, 0↑ 4↓)
duckdb / parquet (1.007x ➖, 1↑ 1↓)
duckdb / duckdb (1.022x ➖, 1↑ 1↓)
File Size Changes (189 files changed, +1.0% overall, 71↑ 118↓)
Totals:
|
File Sizes: Clickbench on NVMEFile Size Changes (105 files changed, +0.0% overall, 95↑ 10↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.087x ➖, 0↑ 7↓)
datafusion / vortex-compact (1.083x ➖, 0↑ 7↓)
datafusion / parquet (1.055x ➖, 0↑ 3↓)
datafusion / arrow (1.107x ❌, 0↑ 11↓)
duckdb / vortex-file-compressed (1.074x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.062x ➖, 0↑ 3↓)
duckdb / parquet (1.047x ➖, 0↑ 4↓)
duckdb / duckdb (1.045x ➖, 0↑ 0↓)
File Size Changes (12 files changed, +0.0% overall, 5↑ 7↓)
Totals:
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.998x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.994x ➖, 0↑ 0↓)
datafusion / parquet (0.997x ➖, 0↑ 0↓)
datafusion / arrow (1.001x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.992x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.997x ➖, 0↑ 0↓)
duckdb / parquet (0.995x ➖, 0↑ 0↓)
duckdb / duckdb (0.992x ➖, 0↑ 0↓)
File Size Changes (40 files changed, +0.0% overall, 14↑ 26↓)
Totals:
|
File Sizes: TPC-H SF=10 on NVMEFile Size Changes (2 files changed, +0.0% overall, 2↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.540x ❌, 0↑ 17↓)
datafusion / vortex-compact (1.435x ❌, 0↑ 14↓)
datafusion / parquet (1.321x ❌, 0↑ 11↓)
duckdb / vortex-file-compressed (1.064x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.046x ➖, 0↑ 0↓)
duckdb / parquet (1.042x ➖, 0↑ 1↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.281x ➖, 0↑ 10↓)
datafusion / vortex-compact (1.191x ➖, 0↑ 3↓)
datafusion / parquet (1.312x ❌, 0↑ 11↓)
duckdb / vortex-file-compressed (1.136x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.120x ➖, 0↑ 0↓)
duckdb / parquet (1.156x ➖, 0↑ 3↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
fc38e36 to
7fa9309
Compare
File Sizes: TPC-DS SF=1 on NVMEFile Size Changes (4 files changed, +0.0% overall, 2↑ 2↓)
Totals:
|
File Sizes: Statistical and Population GeneticsFile Size Changes (2 files changed, +0.0% overall, 1↑ 1↓)
Totals:
|
Benchmarks: Random AccessVortex (geomean): 0.898x ✅ How to read Verdict and Engines
unknown / unknown (0.990x ➖, 12↑ 4↓)
|
|
Need to figure out what to do about WASM here, cloudflare crate requires 64 bit pointers |
|
I gave up on trying to use cardinality estimator for dict compression. I have kept whatever was useful from that pr and maybe will attempt again in the future |
a936fd4 to
90ca40a
Compare
7f744e5 to
a7fe491
Compare
Replace the exact `HashMap`/`HashSet` previously used to compute distinct-value counts during compression stats generation with Cloudflare's `cardinality-estimator` crate. The estimator gives us a bounded-memory approximation (exact up to ~128 distinct values, then HyperLogLog++) so high-cardinality arrays no longer require an O(n) auxiliary hash table to answer the single question "how many unique values does this have?". - Integer stats swap the hash map for a `CardinalityEstimator` and track the most frequent value via a Boyer-Moore majority candidate plus a second-pass exact count. Sparse/dict schemes only care about the heavy hitter (>= 90% threshold) or a rough distinct ratio, so this is behaviourally equivalent for the decisions they make. - Float and string stats likewise drop their hash sets in favor of the estimator. - The integer and float dictionary encoders now rebuild the exact set of distinct values from the source array at compress time, since they need the values themselves and the stats layer no longer retains them. - `SequenceScheme`'s fast-path check for "all values are distinct" now tolerates the estimator's small approximation error; the deferred callback still validates sequences exactly. Signed-off-by: Robert Kruszewski <github@robertk.io>
1adfb50 to
0e10063
Compare
| mod tests { | ||
| use std::sync::LazyLock; | ||
|
|
||
| #[expect(unused_imports)] |
ArrayAccessor is not the fastest api and we can do better by accessing the value iterators directly