GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x #48215

Vishwanatha-HD · 2025-11-21T17:24:28Z

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the "Arrow/Reader_internal" logic.

What changes are included in this PR?

The fix includes changes to following file:
cpp/src/parquet/arrow/reader_internal.cc

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No.

GitHub main Issue link: #48151

GitHub Issue: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on Big-Endian (s390x) systems #48214

github-actions · 2025-11-21T17:24:53Z

⚠️ GitHub issue #48214 has been automatically assigned in GitHub to PR creator.

k8ika0s · 2025-11-23T22:13:31Z

@Vishwanatha-HD

These little corners of the Arrow/Parquet bridge tend to hide the more “surprising” BE behaviors, so it’s always nice to see them getting attention.

My own s390x work didn’t touch reader_internal.cc, so I’m mostly reading this with the lens of “does this match the patterns I’ve seen on hardware.” The decimal min/max extraction you added looks straightforward, and from what I’ve observed, normalizing those integer-backed stats before they’re handed downstream makes a real difference on BE.

Same with the half-float swap: I’ve noticed that half-floats are one of the places where BE architectures drift quickest if the reader doesn’t explicitly re-LE them, so calling FromLittleEndian here feels like the safer side of the fence.

I don’t see any conflicts with what I’ve been doing in encode/decode land — just wanted to chime in and confirm the behavior you’re targeting here lines up with what I’ve seen when running the full Parquet → Arrow → Parquet round-trip paths on BE.

pitrou · 2025-11-24T12:20:01Z

cpp/src/parquet/arrow/reader_internal.cc

+#if ARROW_LITTLE_ENDIAN
  ARROW_ASSIGN_OR_RAISE(*out, chunked_array->View(field->type()));
+#else
+  // Convert little-endian bytes from Parquet to native-endian HalfFloat


I would favor a different approach: turn TransferBinary into:

Status TransferBinary(RecordReader* reader, MemoryPool* pool, const std::shared_ptr<Field>& logical_type_field, std::function<Result<std::shared_ptr<Array>>(std::shared_ptr<Array>)> array_process, std::shared_ptr<ChunkedArray>* out) {

such that the optional array_process is called for each chunk. If carefully coded, this will help limit memory consumption by disposing of old chunks while creating the new ones.

@pitrou.. Thanks for your comments here.. Can I plan to implement this change in the next pass, since the present code change is working.. We want to enable the Apache/Arrow support on s390x as soon as possible since its blocking many of the OCP AI tests on IBM Z.. Thanks.. !!

Vishwanatha-HD

I have addressed all the review comments.. Thanks..

…upport on s390x

Vishwanatha-HD requested a review from wgtmac as a code owner November 21, 2025 17:24

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on Big-Endian (s390x) systems #48214

Open

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Nov 21, 2025

k8ika0s mentioned this pull request Nov 21, 2025

GH-48213: [C++][Parquet] Fix endianness and test failures on s390x (big-endian) (supersedes partial fixes) #48212

Closed

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

Open

Vishwanatha-HD force-pushed the fixArrowInternal branch from 866f301 to 3f8b600 Compare November 22, 2025 05:04

kou changed the title ~~GH-48214 Fix Arrow Reader Internal logic to enable Parquet DB support…~~ GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x Nov 22, 2025

pitrou reviewed Nov 24, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 24, 2025

Vishwanatha-HD commented Nov 26, 2025

View reviewed changes

apacheGH-48214 Fix Arrow Reader Internal logic to enable Parquet DB s…

8b620bb

…upport on s390x

Vishwanatha-HD force-pushed the fixArrowInternal branch from 3f8b600 to 8b620bb Compare November 29, 2025 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x #48215

GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x #48215

Uh oh!

Vishwanatha-HD commented Nov 21, 2025 •

edited by kou

Loading

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

pitrou Nov 24, 2025

Uh oh!

Vishwanatha-HD Nov 26, 2025

Uh oh!

Vishwanatha-HD left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x #48215

Are you sure you want to change the base?

GH-48214: [C++][Parquet] Fix Arrow Reader Internal logic to enable Parquet DB support on s390x #48215

Uh oh!

Conversation

Vishwanatha-HD commented Nov 21, 2025 • edited by kou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

pitrou Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Vishwanatha-HD Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Vishwanatha-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vishwanatha-HD commented Nov 21, 2025 •

edited by kou

Loading