Decode DateTimePart days with fixed-size chunks#8627
Conversation
| let src = days.as_slice::<D>(); | ||
| let (src_chunks, src_rem) = src.as_chunks::<CHUNK>(); | ||
| let (dst_chunks, _) = values.spare_capacity_mut()[..n].as_chunks_mut::<CHUNK>(); | ||
| for (s_chunk, d_chunk) in src_chunks.iter().zip(dst_chunks.iter_mut()) { |
There was a problem hiding this comment.
Did you see any diff in perf between using seq to manually unroll or letting the compiler do that based on the fixed range?
There was a problem hiding this comment.
Yea, without seq_macro with or without manual for i in 0..CHUNK the code is not vectorized.
|
|
||
| // Seconds/subseconds may be Constant — handle the fast path. | ||
| if let Some(seconds) = parts.seconds.as_constant() { | ||
| let seconds = parts.seconds.as_::<AnyColumnar>(); |
There was a problem hiding this comment.
Do we have a micro-benchmark exercising this change?
There was a problem hiding this comment.
clickbench q6 should improve
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.040x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.040x ➖, 1↑ 1↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.750x ✅, 9↑ 0↓)
datafusion / parquet (0.797x ✅, 8↑ 0↓)
duckdb / vortex-file-compressed (0.829x ✅, 8↑ 0↓)
duckdb / parquet (0.886x ✅, 5↑ 0↓)
File Size Changes (3 files changed, -46.3% overall, 1↑ 2↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.979x ➖, 0↑ 0↓)
datafusion / parquet (0.964x ➖, 1↑ 0↓)
datafusion / arrow (0.947x ➖, 4↑ 0↓)
duckdb / vortex-file-compressed (0.965x ➖, 0↑ 0↓)
duckdb / parquet (0.963x ➖, 3↑ 1↓)
File Size Changes (17 files changed, -44.4% overall, 6↑ 11↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.948x ➖, 29↑ 0↓)
datafusion / parquet (0.949x ➖, 26↑ 1↓)
duckdb / vortex-file-compressed (0.934x ➖, 33↑ 0↓)
duckdb / parquet (0.976x ➖, 10↑ 0↓)
File Size Changes (31 files changed, -43.5% overall, 5↑ 26↓)
Totals:
|
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
168.9 µs | 205.6 µs | -17.84% |
| ❌ | Simulation | slice_empty_vortex |
339.4 ns | 397.8 ns | -14.66% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 215.3 ns | +27.1% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
333.9 ns | 275.6 ns | +21.17% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[2048] |
427.8 ns | 369.4 ns | +15.79% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
259.5 µs | 224.4 µs | +15.66% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.5 µs | 271.3 µs | +12.98% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing myrrc/datetime-constant-fix (1647def) with develop (4a90e13)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Benchmarks: Clickbench Sorted on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.971x ➖, 2↑ 1↓)
datafusion / parquet (1.001x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.965x ➖, 2↑ 0↓)
duckdb / parquet (0.989x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -42.6% overall, 45↑ 156↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.104x ➖, 0↑ 0↓)
datafusion / parquet (1.068x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (1.013x ➖, 0↑ 1↓)
duckdb / parquet (0.983x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.992x ➖, 0↑ 0↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 0↑ 3↓)
Totals:
|
| match_each_integer_ptype!(days.ptype(), |D| { | ||
| let src = days.as_slice::<D>(); | ||
| let (src_chunks, src_rem) = src.as_chunks::<CHUNK>(); | ||
| let (dst_chunks, _) = values.spare_capacity_mut()[..n].as_chunks_mut::<CHUNK>(); |
There was a problem hiding this comment.
I don't think you can ignore the remainder here. We never check that n is a multiple of CHUNK
There was a problem hiding this comment.
But I do process it later
| }); | ||
| } | ||
| let tail_start = src_chunks.len() * CHUNK; | ||
| let dst_tail = &mut values.spare_capacity_mut()[tail_start..n]; |
There was a problem hiding this comment.
ah, you don't need this if you do my first comment
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.054x ➖, 0↑ 1↓)
datafusion / parquet (1.052x ➖, 0↑ 2↓)
datafusion / arrow (1.076x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
File Size Changes (47 files changed, -44.4% overall, 10↑ 37↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.085x ➖, 0↑ 11↓)
datafusion / parquet (1.058x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (1.065x ➖, 0↑ 6↓)
duckdb / parquet (1.022x ➖, 1↑ 0↓)
File Size Changes (201 files changed, -39.1% overall, 47↑ 154↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.001x ➖, 2↑ 5↓)
datafusion / parquet (1.157x ➖, 2↑ 6↓)
duckdb / vortex-file-compressed (0.922x ➖, 0↑ 0↓)
duckdb / parquet (0.956x ➖, 0↑ 0↓)
|
0ax1
left a comment
There was a problem hiding this comment.
In theory makes sense, but I also think we should only land this if there's a measurable perf benefit.
|
I would like to have microbenchmarks so this doesn't randomly regress |
|
I've rechecked and it doesn't improve on x86. Only rational improvement then is use from_trusted_len_iter and decode to AnyColumnar. |
make use of constant seconds and subseconds.
constantly.