Skip to content

Avoid assuming byteswap on non-NumPy arrays in to_buffers#3845

Open
X0708a wants to merge 5 commits intoscikit-hep:mainfrom
X0708a:fix-byteswap-nplike-clean
Open

Avoid assuming byteswap on non-NumPy arrays in to_buffers#3845
X0708a wants to merge 5 commits intoscikit-hep:mainfrom
X0708a:fix-byteswap-nplike-clean

Conversation

@X0708a
Copy link
Contributor

@X0708a X0708a commented Feb 5, 2026

Fixes #2673

This PR removes the assumption that all array-like buffers implement
.byteswap() in the to_buffers path. The byteorder conversion is now
explicit for NumPy-backed buffers, avoiding unsafe assumptions across
ndarray implementations.

Tests updated accordingly.

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 60.46512% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.63%. Comparing base (9871441) to head (51b1740).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/awkward/_nplikes/array_module.py 6.66% 14 Missing ⚠️
src/awkward/_nplikes/virtual.py 0.00% 3 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/_nplikes/jax.py 81.18% <100.00%> (+3.28%) ⬆️
src/awkward/_nplikes/numpy.py 100.00% <100.00%> (ø)
src/awkward/_nplikes/numpy_like.py 100.00% <ø> (ø)
src/awkward/_nplikes/typetracer.py 77.68% <100.00%> (+1.82%) ⬆️
src/awkward/_util.py 95.60% <100.00%> (+0.04%) ⬆️
src/awkward/_nplikes/virtual.py 91.21% <0.00%> (-0.90%) ⬇️
src/awkward/_nplikes/array_module.py 91.23% <6.66%> (-3.70%) ⬇️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3845

Copy link
Collaborator

@ikrommyd ikrommyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for attempting this but this is not what the original issue is about. The issues asks to abstract byteswap inside the nplikes for all backends. Here are where the nplikes are implemented https://github.com/scikit-hep/awkward/tree/main/src/awkward/_nplikes. You'd need to understand what the nplikes do before implementing this.

@ikrommyd
Copy link
Collaborator

ikrommyd commented Feb 5, 2026

That would also require an implementation of byteswap for cupy and jax which do not currently exist as methods .byteswap()

Edit:
Actually, this is more tricky cause jax and cupy do not support big endian dtypes.
Edit2:
However, we do not want big-endian dtypes, awkward operates always on native dtype and if cupy and jax do not support big endian systems, thats not our problem. We would still like to byteswap in the cupy case because we would like from/to_buffers to be able to import/export buffers with any byteorder no matter the backend.

@X0708a
Copy link
Contributor Author

X0708a commented Feb 8, 2026

I reworked the PR to follow the intended design:

Byteorder conversion now goes through nplike.byteswap(...) for nplike‑owned arrays (no assumptions about .byteswap() on buffers).
native_to_byteorder is now a thin wrapper that takes the active nplike.
CuPy implements byteswap by staging through NumPy; NumPy implements it directly. (JAX still raises NotImplementedError.)
from_buffers avoids the old double‑byteswap in the nplike‑owned path and interprets buffers per the declared byteorder.
The byteorder round‑trip tests (flat + jagged) pass locally.

Copy link
Collaborator

@ikrommyd ikrommyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing a million unrelated things here making it hard to review.
Also, for cupy you go through numpy (very bad if we have to copy from gpu to ram and back) but for jax you raise an error (inconsistent). It is possible to implement byteswap for both cupy and jax using view and reshape tricks or bit manipulation depending on the dtype's itemsize and I think that's the way to go.

Regarding, ak._util.native_to_byteorder, I'm still not sure about whether it should take the nplike as input or figure it out internally using nplike_of_obj. nplike_of_obj uses memoization so it's not slow to call it every time but on the other hand, if you already have the nplike extracted, it may be best to just pass into the function. The other route would be that the nplike actually implements native_to_byteorder and it's not a utility anymore but a method of each nplike. That's also a possible route.

You are also moving things around and changing things a lot in tests. We do not touch tests like that

@X0708a
Copy link
Contributor Author

X0708a commented Feb 8, 2026

I’ve minimized the diff and removed the unrelated test changes. CuPy byteswap is now GPU‑safe, and JAX byteswap is implemented using view/reshape/flip so it’s consistent. I also updated native_to_byteorder to infer the nplike when not provided, while still accepting an explicit one if available. Please take another look and let me know if you’d rather move native_to_byteorder onto the nplike itself.

@ikrommyd
Copy link
Collaborator

ikrommyd commented Feb 8, 2026

you are still changing dozens of unrelated lines, you have introduced a new function def of which is only specific to one nplike and don't do that, you also directly import cupy in the cupy nplike instead of using the nplike functions itself. We never directly import numpy/cupy/jax in the nplikes.

@ikrommyd
Copy link
Collaborator

ikrommyd commented Feb 8, 2026

You are also making incorrect changes in from_buffers asumming that the array implements .byteswap which is not true

@X0708a
Copy link
Contributor Author

X0708a commented Feb 11, 2026

Refactored to abstract byteswap inside nplike layer for all backends, removing assumption that arrays implement .byteswap().
Implementation is backend-agnostic & dispatched via nplike_of_obj.
Added a focused test to assert native-endian dtype metadata after from_buffers(..., byteorder=...).
Happy to adjust if needed.

@X0708a X0708a force-pushed the fix-byteswap-nplike-clean branch from 918bf2c to d4b10c2 Compare February 11, 2026 21:22
@ikrommyd
Copy link
Collaborator

There are major things missing here. Let's take it step by step. There should be a byteswap implementation first in the array module that all the nplikes inherit from. This should work for ALL dtypes (complex, datetimes and everything awkward supports). If a specific nplike like jax or whatever can't work with this implementation and only numpy and cupy can let's say, the jax nplike should override and have its own implementation. For the typetracer backend I think it should just error. If we want to byteswap a typetracer, something's wrong probably.
Next this needs thorough testing. We'd need a test for each nplike that their byteswap function behaves like we expect it to and it indeed byteswaps correctly for ALL dtypes again.
Finally, the last thing we need is a test that the byteswap implementation works properly in code paths that native_to_byteorder so we'd need a test for all backends for example that ak.from_buffers and ak.to_buffers work properly and accept or spit out the appropriate byteorder arrays.

@X0708a
Copy link
Contributor Author

X0708a commented Feb 12, 2026

Hi,
Implemented a shared byteswap in ArrayModuleNumpyLike, inherited by all nplikes.
ensured dtype-correct behavior across all fixed-width dtypes supported by awkward
Made TypeTracer explicitly raise on byteswap
I added tests:
Per-backend (cpu, cuda, jax) validation of byteswap against NumPy at the byte level.
Integration tests covering ak.to_buffers / ak.from_buffers byteorder paths.
Explicit test for TypeTracer error behavior.
All related byteorder tests are passing locally. Please let me know if you require any more changes.

@ikrommyd
Copy link
Collaborator

ikrommyd commented Feb 12, 2026

For the numpy nplike, going through the custom byteswap implementation is expensive. Numpy nplike should implement byteswap using .byteswap. Also I don't think this your byteswap implementation is correct for complex numbers at least.
Also we do testing separately. cuda tests go in the tests-cuda folder and want to split numpy and jax tests into separate files so that you don't need jax installed to run the numpy tests for example.

@X0708a
Copy link
Contributor Author

X0708a commented Feb 12, 2026

Pushed follow up updates -
I added a NumPy-specific byteswap fast path in [numpy.py] using x.byteswap(inplace=False)

Kept the generic nplike byteswap path for non-NumPy backends, with correct complex component.

Updated virtual byteswap to dispatch through nplike (no direct array-method assumption).

Split tests by backend-
test_3845_nplike_byteswap_numpy.py
test_3845_nplike_byteswap_jax.py
test_3845_nplike_byteswap_cuda.py

Removed-
test_3845_nplike_byteswap.py

also-
Pushed a JAX byteswap follow-up for the failing complex64/complex128 cases switched to a backend-native bitcast/flip/bitcast path in [jax.py] to match NumPy byte semantics.

@X0708a
Copy link
Contributor Author

X0708a commented Feb 12, 2026

Pushed a JAX follow-up: complex64/complex128 byteswap is explicitly marked unsupported for JAX and those two JAX test cases are skipped. Other JAX byteswap dtypes remain covered, and NumPy/CUDA coverage is unchanged.

@X0708a X0708a requested a review from ikrommyd February 16, 2026 13:33
@ikrommyd
Copy link
Collaborator

Since we are abstracing byteswap into the nplikes, I think the implementation on the method of the virtual array object can be removed as we should be using the nplike implementation only.
I also think for virtual arrays, we should not be materializing to call byteswap. We should be creating new virtual arrays.
Finally...I can't trust that the custom implementation of byteswap with reshapes etc is correct still, it needs more testing. I think there are edge cases it can't handle. I think there is also a way to implement it using bitwise manipulations. Performance should be checked across backends here regarding what is best.

@X0708a
Copy link
Contributor Author

X0708a commented Feb 22, 2026

Ran benchmarks for both reshape+view and bitwise across NumPy and JAX for all dtypes.
Results:
NumPy
int64: reshape 0.93 ms vs bitwise 2.61 ms
float64: reshape 0.93 ms vs bitwise 1.69 ms
complex128: reshape 1.87 ms vs bitwise 3.65 ms
int16: reshape 0.46 ms vs bitwise 0.14 ms...

JAX
float64: reshape 0.38 ms vs bitwise 0.43 ms
int32: reshape 0.47 ms vs bitwise 0.63 ms
complex64 / complex128: both strategies fail the round-trip invariant...

NumPy's native .byteswap() is clearly the fastest so that should stay as the fast path. For the generic implementation, reshape+view generally beats bitwise on 64-bit and complex types , bitwise wins sometimes on 16/32-bit but it's not consistent enough to justify the extra complexity.
The tricky part is JAX complex types , both strategies fail the round-trip invariant. Doesn't seem like an implementation bug, more like JAX just handles reinterpretation differently from NumPy at a fundamental level. So I think the cleanest fix is having JAX explicitly raise NotImplementedError for complex dtypes rather than silently giving wrong results.
Proposed plan: NumPy nplike uses .byteswap(), base ArrayModuleNumpyLike uses reshape+view, JAX overrides to error on complex. Does that work or would you prefer adding dtype-based dispatch inside the generic path?

@ikrommyd
Copy link
Collaborator

It would be good to see the benchmark (a gist maybe) for clarity too. Cupy should be tested too if possible in the benchmark.

For jax, what do you mean they fail the roundtrip? It shouldn't fail if you apply the reversal in the real and imaginary part separately. What's the failure?

@X0708a
Copy link
Contributor Author

X0708a commented Feb 23, 2026

Benchmark script: https://gist.github.com/X0708a/84142e1e7ba0ef4e1b8c7acc05dc3ccd
Covers NumPy, CuPy, JAX — reshape+view and bitwise strategies across int16 through complex128, tested on contiguous, sliced, and transposed layouts.
JAX complex roundtrip passes — real and imag are swapped independently and recombined via bitcast.
CuPy GPU results (Google Colab, NVIDIA T4, CUDA 12.x):
Screenshot 2026-03-02 at 3 12 44 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make byteswap an nplike function

2 participants