Skip to content

Conversation

AntoinePrv
Copy link
Contributor

@AntoinePrv AntoinePrv commented Sep 5, 2025

Rationale for this change

Tests and benchmarks make it easier to iterate and compare improvements.

What changes are included in this PR?

  • New tests
  • New benchmarks
  • !Uniform API between unpackDD_XXX functions for genericity in tests/benchmarks.
    Also results in safer API suggesting that the data may not be aligned.

Are these changes tested?

Yes very much.

Are there any user-facing changes?

No.

Copy link

github-actions bot commented Sep 5, 2025

⚠️ GitHub issue #47514 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for doing this @AntoinePrv , this will be a nice improvement.

num_bits);
int unpack32_avx2(const uint8_t* in, uint32_t* out, int batch_size, int num_bits) {
return unpack32_specialized<UnpackBits256<DispatchLevel::AVX2>>(
reinterpret_cast<const uint32_t*>(in), out, batch_size, num_bits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also change the signature of the generated functions, what do you think? The current signature assumes the SIMD functions will load the input in 32-bit chunks, but they might make different choices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to keep that for a future PR, and keep this one focused on the tests and benchmark of the "public" functions. There are a few things we can change internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

auto [num_values, bit_width] = GetParam();
constexpr int32_t kExtraValues = sizeof(Int) * 8;

auto const packed = generateRandomPackedValues(num_values + kExtraValues, bit_width);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random data doesn't need to be generated again in each call to TestRoundtripAlignement. You can just generate it in the fixture's SetUp method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know the largest test parameter to know how big of a buffer to generate.

};

INSTANTIATE_TEST_SUITE_P(
MutpliesOf64Values, UnpackingRandomRoundTrip,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "MutpliesOf64Values"?

Copy link
Contributor Author

@AntoinePrv AntoinePrv Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these functions do not handle all possible input sizes.

return out;
}

/// Use BitWriter to pack values into a vector.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So BitWriter is used as a reference bit-packing implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arf, yes. It is not so simple to write a dumb implementation with alignments, bit shifts...
I thought on top of that we could add some unpacking tests against known values (manually packed) but the data may not be very readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added tests for some known values for which we can easily extrapolate the unpacking almost independently of the bit_width:

  • All zeros
  • All ones
  • Alternating ones and zeros

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 8, 2025
@AntoinePrv AntoinePrv force-pushed the unpack-test-benchmark branch 2 times, most recently from dfb2cd0 to 3b39c0f Compare September 9, 2025 13:59
@AntoinePrv AntoinePrv requested a review from pitrou September 9, 2025 14:22
@pitrou
Copy link
Member

pitrou commented Sep 15, 2025

#47501 was just merged, you'll need to rebase/merge at some point and then fix lint failures.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking excellent. There are a couple, relatively minor, things to address. Then it should be good to go :)

@AntoinePrv AntoinePrv force-pushed the unpack-test-benchmark branch from 3b39c0f to 90a4495 Compare September 16, 2025 08:49
@AntoinePrv AntoinePrv requested a review from pitrou September 16, 2025 09:06
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @AntoinePrv

@pitrou
Copy link
Member

pitrou commented Sep 16, 2025

@ursabot please benchmark lang=C++

@voltrondatabot
Copy link

Benchmark runs are scheduled for commit caeec86. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@pitrou
Copy link
Member

pitrou commented Sep 16, 2025

The arrow-azurefs-test failure/timeout on the ASAN job is certainly unrelated, we can ignore it.

@pitrou pitrou merged commit 931acd8 into apache:main Sep 16, 2025
38 of 39 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Sep 16, 2025
@AntoinePrv AntoinePrv deleted the unpack-test-benchmark branch September 16, 2025 11:11
Copy link

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit caeec86.

There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details.

@pitrou
Copy link
Member

pitrou commented Sep 16, 2025

The benchmarks above are interesting @AntoinePrv . On all 4 machines, the scalar version is faster than the SIMD version for bit_width=20.

Perhaps more insights could be gained by looking at the generated SIMD code. From a quick look at the AVX2 unpack32 functions, I see a lot of scalar shifts (shld) and single-element inserts (vpinsrd). Ideally, we would expect the compiler to generate shuffle instructions and vector shifts?

@pitrou
Copy link
Member

pitrou commented Sep 16, 2025

That said, I expect the most important specializations to be those with a small bit width, so this might not be an important concern for now.

@AntoinePrv
Copy link
Contributor Author

Yes, I had noticed similar behavior. Gonna investigate in #47573.

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 931acd8.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants