Make GPU CUDA plugin require JAX #8919

tengyifei · 2025-04-01T22:06:26Z

This fixes internal b/419277657.

Some PyTorch/XLA GPU features require JAX. The CI tests ad-hoc install the latest version of JAX, creating a skew with the JAX version pinned by PyTorch/XLA, thus causing test failures.

Rather than only installing the latest version of JAX in CI, we'll just make the CUDA plugin depend on a version of JAX that's the same as what's used by PyTorch/XLA on TPU.

Side note, there appears to be two ways of building PyTorch/XLA for GPU. One is by setting XLA_CUDA=1, which will cause the PyTorch/XLA C++ .so to be built with CUDA support. Another is by building a "PyTorch/XLA CUDA plugin" similar to the libtpu plugin, thus factoring CUDA-specific functionality behind a backend .so. That shows up as the "Build PyTorch/XLA CUDA Plugin" job in CI. It appears that our cloud build jobs ref build the CUDA plugin but does not upload them to GCS (discussed earlier in #8876).

In any case, our GPU CI infra uses the CUDA plugin path and don't bake in CUDA support into the PyTorch/XLA native .so. Therefore, in order to fix GPU CI, I made the CUDA plugin depend on the right versions of JAX.

Some XLA GPU features require JAX. Rather than only installing the latest version of JAX in CI, we'll just make the CUDA plugin depend on a version of JAX that's the same as what's used by PyTorch/XLA on TPU. (Except the JAX CUDA wheels).

tengyifei · 2025-05-24T02:30:24Z

This is not easily fixable because JAX 0.6.1 started requiring CuDNN 9.8 (see https://github.com/jax-ml/jax/blob/main/CHANGELOG.md?plain=1#L61), but CuDNN 9.8 requires Debian 12 (#8928).

tengyifei force-pushed the yifeit/cuda-plugin branch from e83d9b5 to 1b6d9ac Compare April 1, 2025 22:08

tengyifei mentioned this pull request Apr 4, 2025

Include torchax in torch_xla #8895

Merged

Make GPU CUDA plugin require JAX

47d6ea1

Some XLA GPU features require JAX. Rather than only installing the latest version of JAX in CI, we'll just make the CUDA plugin depend on a version of JAX that's the same as what's used by PyTorch/XLA on TPU. (Except the JAX CUDA wheels).

tengyifei force-pushed the yifeit/cuda-plugin branch from 1b6d9ac to 47d6ea1 Compare May 23, 2025 08:01

actually add jax dependencies to the CUDA plugin

2e67486

tengyifei force-pushed the yifeit/cuda-plugin branch from 4f623f5 to 2e67486 Compare May 23, 2025 18:43

tengyifei added 2 commits May 23, 2025 19:52

Don't bundle jax wheels into the cuda plugin artifact

c2024cb

Update cudnn to 9.8.0

9a11f5d

tengyifei mentioned this pull request May 25, 2025

Development image, CI image, release docker images are EoL #9246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make GPU CUDA plugin require JAX #8919

Make GPU CUDA plugin require JAX #8919

Uh oh!

tengyifei commented Apr 1, 2025 •

edited

Loading

Uh oh!

tengyifei commented May 24, 2025

Uh oh!

Uh oh!

Make GPU CUDA plugin require JAX #8919

Are you sure you want to change the base?

Make GPU CUDA plugin require JAX #8919

Uh oh!

Conversation

tengyifei commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tengyifei commented May 24, 2025

Uh oh!

Uh oh!

tengyifei commented Apr 1, 2025 •

edited

Loading