Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device support in zarr-python (especially for GPU) #2658

Open
nenb opened this issue Jan 6, 2025 · 1 comment
Open

Device support in zarr-python (especially for GPU) #2658

nenb opened this issue Jan 6, 2025 · 1 comment

Comments

@nenb
Copy link

nenb commented Jan 6, 2025

Problem
I would like to load zarr data directly onto non-CPU devices (especially GPU). The current approach appears to rely on using cupy to load onto cupy-supported devices e.g. https://github.com/rapidsai/kvikio/blob/branch-25.02/notebooks/zarr.ipynb.

Unfortunately, there are a number of devices that are not supported by cupy e.g. I don't believe that my Apple Metal GPU is supported. This means that I must load from zarr via CPU if I would like to use these devices e.g. zarr on disk -> numpy -> torch (which has Metal support).

This is slow(er) and I don't believe is necessary from the zarr specification alone (?).

Background
Multi-device support is a very important requirement in the AI/ML community. I would like to use zarr (and specifically the Python implementation) to run models such as LLMs on multiple devices. The quicker it is to load the model onto device (and with reduced memory usage etc), the better the UX and developer experience is.

Questions

  1. Is cupy the correct/only way to load direct to GPU with zarr-python?
  2. Is there/will there be any way of loading direct to devices such as Metal with zarr-python?
  3. (Related) What is the best way to load a PyTorch neural network on GPU with zarr-python? Is it cupy and then using something like dlpack for zero-copy exchange? Are there alternatives?

Related issues
#1967
#2574

cc @jhamman (as suggested by @TomNicholas)

@ziw-liu
Copy link
Contributor

ziw-liu commented Jan 8, 2025

CuPy/kvikio relies on nvidia's GPUDirect storage (GDS) driver and goes through PCIe. Metal GPUs are using unified memory, so CPU-to-GPU transfer can in theory be almost zero-cost (passing an address). If there is a way to pass the ownership of an array from CPU to GPU, nothing needs to be done in zarr unless there is need for GPU-accelerated decompression.

In practice though, at least torch implements the to("mps") method by cloning the tensor (memcpy-ish cost), and each ML framework may do different things. Another reference point is jax, which implements (experimental) serialization to zarr using tensorstore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants