-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x #48215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
866f301 to
3f8b600
Compare
|
These little corners of the Arrow/Parquet bridge tend to hide the more “surprising” BE behaviors, so it’s always nice to see them getting attention. My own s390x work didn’t touch Same with the half-float swap: I’ve noticed that half-floats are one of the places where BE architectures drift quickest if the reader doesn’t explicitly re-LE them, so calling I don’t see any conflicts with what I’ve been doing in encode/decode land — just wanted to chime in and confirm the behavior you’re targeting here lines up with what I’ve seen when running the full Parquet → Arrow → Parquet round-trip paths on BE. |
| #if ARROW_LITTLE_ENDIAN | ||
| ARROW_ASSIGN_OR_RAISE(*out, chunked_array->View(field->type())); | ||
| #else | ||
| // Convert little-endian bytes from Parquet to native-endian HalfFloat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would favor a different approach: turn TransferBinary into:
Status TransferBinary(RecordReader* reader, MemoryPool* pool,
const std::shared_ptr<Field>& logical_type_field,
std::function<Result<std::shared_ptr<Array>>(std::shared_ptr<Array>)> array_process,
std::shared_ptr<ChunkedArray>* out) {such that the optional array_process is called for each chunk. If carefully coded, this will help limit memory consumption by disposing of old chunks while creating the new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou.. Thanks for your comments here.. Can I plan to implement this change in the next pass, since the present code change is working.. We want to enable the Apache/Arrow support on s390x as soon as possible since its blocking many of the OCP AI tests on IBM Z.. Thanks.. !!
Vishwanatha-HD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed all the review comments.. Thanks..
…upport on s390x
3f8b600 to
8b620bb
Compare
Rationale for this change
This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the "Arrow/Reader_internal" logic.
What changes are included in this PR?
The fix includes changes to following file:
cpp/src/parquet/arrow/reader_internal.cc
Are these changes tested?
Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.
Are there any user-facing changes?
No.
GitHub main Issue link: #48151