-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47514: [C++][Parquet] Add unpack tests and benchmarks #47515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
d8d1970
to
c77759e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for doing this @AntoinePrv , this will be a nice improvement.
num_bits); | ||
int unpack32_avx2(const uint8_t* in, uint32_t* out, int batch_size, int num_bits) { | ||
return unpack32_specialized<UnpackBits256<DispatchLevel::AVX2>>( | ||
reinterpret_cast<const uint32_t*>(in), out, batch_size, num_bits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also change the signature of the generated functions, what do you think? The current signature assumes the SIMD functions will load the input in 32-bit chunks, but they might make different choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to keep that for a future PR, and keep this one focused on the tests and benchmark of the "public" functions. There are a few things we can change internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
cpp/src/arrow/util/bpacking_test.cc
Outdated
auto [num_values, bit_width] = GetParam(); | ||
constexpr int32_t kExtraValues = sizeof(Int) * 8; | ||
|
||
auto const packed = generateRandomPackedValues(num_values + kExtraValues, bit_width); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The random data doesn't need to be generated again in each call to TestRoundtripAlignement
. You can just generate it in the fixture's SetUp
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to know the largest test parameter to know how big of a buffer to generate.
cpp/src/arrow/util/bpacking_test.cc
Outdated
}; | ||
|
||
INSTANTIATE_TEST_SUITE_P( | ||
MutpliesOf64Values, UnpackingRandomRoundTrip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "MutpliesOf64Values"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these functions do not handle all possible input sizes.
return out; | ||
} | ||
|
||
/// Use BitWriter to pack values into a vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So BitWriter
is used as a reference bit-packing implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arf, yes. It is not so simple to write a dumb implementation with alignments, bit shifts...
I thought on top of that we could add some unpacking tests against known values (manually packed) but the data may not be very readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added tests for some known values for which we can easily extrapolate the unpacking almost independently of the bit_width:
- All zeros
- All ones
- Alternating ones and zeros
dfb2cd0
to
3b39c0f
Compare
#47501 was just merged, you'll need to rebase/merge at some point and then fix lint failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking excellent. There are a couple, relatively minor, things to address. Then it should be good to go :)
3b39c0f
to
90a4495
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot @AntoinePrv
@ursabot please benchmark lang=C++ |
Benchmark runs are scheduled for commit caeec86. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
The |
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit caeec86. There were 3 benchmark results indicating a performance regression:
The full Conbench report has more details. |
The benchmarks above are interesting @AntoinePrv . On all 4 machines, the scalar version is faster than the SIMD version for bit_width=20. Perhaps more insights could be gained by looking at the generated SIMD code. From a quick look at the AVX2 unpack32 functions, I see a lot of scalar shifts ( |
That said, I expect the most important specializations to be those with a small bit width, so this might not be an important concern for now. |
Yes, I had noticed similar behavior. Gonna investigate in #47573. |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 931acd8. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Tests and benchmarks make it easier to iterate and compare improvements.
What changes are included in this PR?
unpackDD_XXX
functions for genericity in tests/benchmarks.Also results in safer API suggesting that the data may not be aligned.
Are these changes tested?
Yes very much.
Are there any user-facing changes?
No.