GH-47514: [C++][Parquet] Add unpack tests and benchmarks #47515

AntoinePrv · 2025-09-05T14:02:43Z

Rationale for this change

Tests and benchmarks make it easier to iterate and compare improvements.

What changes are included in this PR?

New tests
New benchmarks
!Uniform API between unpackDD_XXX functions for genericity in tests/benchmarks.
Also results in safer API suggesting that the data may not be aligned.

Are these changes tested?

Yes very much.

Are there any user-facing changes?

No.

GitHub Issue: [C++][Parquet] Add tests and benchmarks for unpack functions #47514
GitHub Issue: [C++][Parquet] Api inconsistency for bpacking32/bpacking64 #39594

github-actions · 2025-09-05T14:03:07Z

⚠️ GitHub issue #47514 has been automatically assigned in GitHub to PR creator.

pitrou

Thanks a lot for doing this @AntoinePrv , this will be a nice improvement.

cpp/src/arrow/util/CMakeLists.txt

pitrou · 2025-09-08T14:28:55Z

cpp/src/arrow/util/bpacking_avx2.cc

-                                                                  num_bits);
+int unpack32_avx2(const uint8_t* in, uint32_t* out, int batch_size, int num_bits) {
+  return unpack32_specialized<UnpackBits256<DispatchLevel::AVX2>>(
+      reinterpret_cast<const uint32_t*>(in), out, batch_size, num_bits);


I think we can also change the signature of the generated functions, what do you think? The current signature assumes the SIMD functions will load the input in 32-bit chunks, but they might make different choices.

I intended to keep that for a future PR, and keep this one focused on the tests and benchmark of the "public" functions. There are a few things we can change internally.

Fair enough!

cpp/src/arrow/util/bpacking_benchmark.cc

pitrou · 2025-09-08T14:59:10Z

cpp/src/arrow/util/bpacking_test.cc

+    auto [num_values, bit_width] = GetParam();
+    constexpr int32_t kExtraValues = sizeof(Int) * 8;
+
+    auto const packed = generateRandomPackedValues(num_values + kExtraValues, bit_width);


The random data doesn't need to be generated again in each call to TestRoundtripAlignement. You can just generate it in the fixture's SetUp method.

We need to know the largest test parameter to know how big of a buffer to generate.

cpp/src/arrow/util/bpacking_test.cc

pitrou · 2025-09-08T15:03:44Z

cpp/src/arrow/util/bpacking_test.cc

+};
+
+INSTANTIATE_TEST_SUITE_P(
+    MutpliesOf64Values, UnpackingRandomRoundTrip,


Why "MutpliesOf64Values"?

Because these functions do not handle all possible input sizes.

pitrou · 2025-09-08T15:04:17Z

cpp/src/arrow/util/bpacking_test.cc

+  return out;
+}
+
+/// Use BitWriter to pack values into a vector.


So BitWriter is used as a reference bit-packing implementation?

Arf, yes. It is not so simple to write a dumb implementation with alignments, bit shifts...
I thought on top of that we could add some unpacking tests against known values (manually packed) but the data may not be very readable.

I have added tests for some known values for which we can easily extrapolate the unpacking almost independently of the bit_width:

All zeros

All ones

Alternating ones and zeros

cpp/src/arrow/util/bpacking_test.cc

cpp/src/arrow/util/bpacking_internal.h

pitrou · 2025-09-15T14:46:11Z

#47501 was just merged, you'll need to rebase/merge at some point and then fix lint failures.

cpp/src/arrow/util/bpacking_benchmark.cc

pitrou

This is looking excellent. There are a couple, relatively minor, things to address. Then it should be good to go :)

pitrou

LGTM, thanks a lot @AntoinePrv

pitrou · 2025-09-16T09:45:55Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-09-16T09:46:01Z

Benchmark runs are scheduled for commit caeec86. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

pitrou · 2025-09-16T09:46:32Z

The arrow-azurefs-test failure/timeout on the ASAN job is certainly unrelated, we can ignore it.

conbench-apache-arrow · 2025-09-16T11:51:58Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit caeec86.

There were 3 benchmark results indicating a performance regression:

Pull Request Run on arm64-t4g-2xlarge-linux at 2025-09-16 11:08:20Z
- ChunkedArrayRankInt64Narrow (C++) with params=65536/100/tiebreaker:2, source=cpp-micro, suite=arrow-compute-vector-sort-benchmark
- CompressedInputStreamZeroCopyBufferReturnedByCallee (C++) with params=Compression::LZ4_FRAME/num_bytes:1048576/batch_size:1048576, source=cpp-micro, suite=arrow-io-compressed-benchmark
and 1 more (see the report linked below)

The full Conbench report has more details.

pitrou · 2025-09-16T13:15:09Z

The benchmarks above are interesting @AntoinePrv . On all 4 machines, the scalar version is faster than the SIMD version for bit_width=20.

Perhaps more insights could be gained by looking at the generated SIMD code. From a quick look at the AVX2 unpack32 functions, I see a lot of scalar shifts (shld) and single-element inserts (vpinsrd). Ideally, we would expect the compiler to generate shuffle instructions and vector shifts?

pitrou · 2025-09-16T13:33:44Z

That said, I expect the most important specializations to be those with a small bit width, so this might not be an important concern for now.

AntoinePrv · 2025-09-16T16:00:02Z

Yes, I had noticed similar behavior. Gonna investigate in #47573.

conbench-apache-arrow · 2025-09-16T17:04:56Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 931acd8.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

github-actions bot added Component: C++ awaiting review Awaiting review labels Sep 5, 2025

AntoinePrv force-pushed the unpack-test-benchmark branch from d8d1970 to c77759e Compare September 8, 2025 14:15

pitrou requested changes Sep 8, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 8, 2025

AntoinePrv force-pushed the unpack-test-benchmark branch 2 times, most recently from dfb2cd0 to 3b39c0f Compare September 9, 2025 13:59

AntoinePrv requested a review from pitrou September 9, 2025 14:22