chore: Python CUDA bridge: CI and buffer handoff ABI#8618
Conversation
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
169 µs | 205.6 µs | -17.79% |
| ❌ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
299.5 µs | 350.9 µs | -14.66% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
259.4 µs | 224.3 µs | +15.65% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.7 µs | 271.4 µs | +13% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 244.4 ns | +11.93% |
| ⚡ | Simulation | eq_i64_constant |
319.3 µs | 289.4 µs | +10.32% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ad/pycudf3 (f2bb115) with develop (2a15a9f)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
ac4165b to
59fb909
Compare
Add explicit GPU-runner CI coverage for the Python CUDA bridge through the vortex-data[cuda] optional-extra path. Extend the private metadata bridge to carry host buffer-export capsules instead of only a buffer count. The base Python package exports repr(C) VortexBufferExport descriptors, and vortex-python-cuda imports them into local BufferHandles before deserializing arrays through its own VortexSession. Tests now cover primitive, nullable, bool, and struct arrays across the bridge, plus the existing CUDA Arrow Device smoke path. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
CI coverage for the Python CUDA bridge through the vortex-data[cuda] optional-extra path.
Extend the private metadata bridge to carry host buffer-export capsules instead of only a buffer count. The base Python package exports repr(C) VortexBufferExport descriptors, and vortex-python-cuda imports them into local BufferHandles before deserializing arrays through its own VortexSession.
Tests now cover primitive, nullable, bool, and struct arrays across the bridge, plus the existing CUDA Arrow Device smoke path.