bench: bit-packed compare-constant baseline by joseph-isaacs · Pull Request #8012 · vortex-data/vortex

joseph-isaacs · 2026-05-18T17:03:33Z

Summary

Adds a divan benchmark bitpack_compare in vortex-fastlanes that compares an Operator::Eq / Operator::Lt against an out-of-range constant on a BitPackedData array vs. an explicit "decompress, then Arrow compare" baseline that materialises the unpacked PrimitiveArray first.
The constant is chosen as 1 << BW, just past the packable range, so a future compare-constant kernel can recognise it and short-circuit. Today both arms decompress; this PR establishes a baseline for that follow-up to land against.
Grid sized for fast runs: len ∈ {1024, 65536}, bit_width ∈ {4, 16}, Eq + Lt.

The follow-up optimization (out-of-range fast path on BitPacked, plus the bitpack_constant analytical encoder) is in #PR2-PLACEHOLDER, stacked on this branch. Splitting the bench out lets the speedup PR show concrete numbers against this measured baseline.

Test plan

cargo check -p vortex-fastlanes --benches
cargo bench -p vortex-fastlanes --bench bitpack_compare records the slow baseline numbers prior to the follow-up landing

🤖 Generated with Claude Code

Add `bitpack_compare` divan bench in vortex-fastlanes that pits a binary `Operator::Eq` / `Operator::Lt` against an out-of-range constant on a `BitPackedData` array against an explicit "decompress, then Arrow compare" baseline that materialises the unpacked `PrimitiveArray` first. The constant is chosen as `1 << BW`, i.e. just past the packable range, so a future kernel that recognises out-of-range constants can short-circuit it. Today both arms decompress; the benchmark establishes a baseline for that upcoming optimization to land against. Sized small (`len ∈ {1024, 65536}`, `bit_width ∈ {4, 16}`, Eq + Lt) so it finishes quickly. Run with `cargo bench -p vortex-fastlanes --bench bitpack_compare`. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

codspeed-hq · 2026-05-18T17:10:46Z

Merging this PR will improve performance by 18%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
✅ 1217 untouched benchmarks
🆕 16 new benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	197.9 µs	162 µs	+22.19%
⚡	Simulation	`chunked_varbinview_into_canonical[(100, 100)]`	358.4 µs	323.5 µs	+10.78%
⚡	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	211.2 µs	175.8 µs	+20.11%
⚡	Simulation	`chunked_varbinview_opt_canonical_into[(1000, 10)]`	224.8 µs	188.6 µs	+19.23%
🆕	Simulation	`baseline_eq[16, 1024]`	N/A	64.1 µs	N/A
🆕	Simulation	`baseline_lt[16, 1024]`	N/A	64.4 µs	N/A
🆕	Simulation	`baseline_lt[16, 65536]`	N/A	274.5 µs	N/A
🆕	Simulation	`baseline_lt[4, 1024]`	N/A	64.1 µs	N/A
🆕	Simulation	`baseline_eq[16, 65536]`	N/A	259.4 µs	N/A
🆕	Simulation	`baseline_lt[4, 65536]`	N/A	251.9 µs	N/A
🆕	Simulation	`fast_eq_out_of_range[16, 65536]`	N/A	291.1 µs	N/A
🆕	Simulation	`fast_eq_out_of_range[4, 1024]`	N/A	67 µs	N/A
🆕	Simulation	`baseline_eq[4, 1024]`	N/A	63.1 µs	N/A
🆕	Simulation	`baseline_eq[4, 65536]`	N/A	237.9 µs	N/A
🆕	Simulation	`fast_eq_out_of_range[16, 1024]`	N/A	67.7 µs	N/A
🆕	Simulation	`fast_lt_out_of_range[16, 65536]`	N/A	306.3 µs	N/A
🆕	Simulation	`fast_lt_out_of_range[4, 65536]`	N/A	262.1 µs	N/A
🆕	Simulation	`fast_eq_out_of_range[4, 65536]`	N/A	246 µs	N/A
🆕	Simulation	`fast_lt_out_of_range[16, 1024]`	N/A	67.9 µs	N/A
🆕	Simulation	`fast_lt_out_of_range[4, 1024]`	N/A	87.8 µs	N/A

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/bitpack-compare-bench-KGPS3 (5f53991) with develop (faf7e42)}

This was referenced May 18, 2026

fastlanes: bit-packed compare-constant fast path + bitpack_constant kernel #8013

Closed

Fast-path comparison and constant encoding for bit-packed arrays #8011

Closed

joseph-isaacs requested a review from robert3005 May 18, 2026 17:05

joseph-isaacs added the changelog/skip Do not list PR in the changelog label May 18, 2026

joseph-isaacs enabled auto-merge (squash) May 18, 2026 17:24

robert3005 approved these changes May 18, 2026

View reviewed changes

joseph-isaacs merged commit 7b47788 into develop May 18, 2026
66 of 67 checks passed

joseph-isaacs deleted the claude/bitpack-compare-bench-KGPS3 branch May 18, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench: bit-packed compare-constant baseline#8012

bench: bit-packed compare-constant baseline#8012
joseph-isaacs merged 1 commit into
developfrom
claude/bitpack-compare-bench-KGPS3

joseph-isaacs commented May 18, 2026

Uh oh!

codspeed-hq Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

joseph-isaacs commented May 18, 2026

Summary

Test plan

Uh oh!

codspeed-hq Bot commented May 18, 2026

Merging this PR will improve performance by 18%

Performance Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants