Skip to content

Conversation

@wdhongtw
Copy link
Collaborator

@wdhongtw wdhongtw commented Dec 4, 2025

Description

Avoid installing CUDA related stuff

  • Use PyTorch CPU version so we avoid installing CUDA.

This modification keeps the functionality and reduce image size by about 7.7GB.

wdhongtw/vllm-tpu   latest   d055fd2151a0   22 minutes ago      11.8GB
wdhongtw/vllm-tpu   base     07dbf76dbed8   About an hour ago   19.5GB

See official doc for recommended way to install CPU version of PyTorch.

The pip list output before and after generate following diff

--- ./cuda.txt  2025-12-10 14:22:24.124896422 +0000
+++ ./cpu.txt   2025-12-10 14:24:50.718893457 +0000
@@ -101,15 +100,0 @@
-nvidia-cublas-cu12                 12.8.4.1
-nvidia-cuda-cupti-cu12             12.8.90
-nvidia-cuda-nvrtc-cu12             12.8.93
-nvidia-cuda-runtime-cu12           12.8.90
-nvidia-cudnn-cu12                  9.10.2.21
-nvidia-cufft-cu12                  11.3.3.83
-nvidia-cufile-cu12                 1.13.1.3
-nvidia-curand-cu12                 10.3.9.90
-nvidia-cusolver-cu12               11.7.3.90
-nvidia-cusparse-cu12               12.5.8.93
-nvidia-cusparselt-cu12             0.7.1
-nvidia-nccl-cu12                   2.27.3
-nvidia-nvjitlink-cu12              12.8.93
-nvidia-nvshmem-cu12                3.3.20
-nvidia-nvtx-cu12                   12.8.90
@@ -197 +182 @@
-torch                              2.8.0
+torch                              2.8.0+cpu
@@ -200 +185 @@
-torchvision                        0.23.0
+torchvision                        0.23.0+cpu
@@ -206 +191 @@
-triton                             3.4.0
+triton                             3.5.1

With --extra-index-url, pip now can see cuda version and cpu version,
and the cpu version has higher priority.

  • PyTorch decide to make the "canonical" torch package on Linux platform be the cuda-ready one,
    and put the cpu-ready version on https://download.pytorch.org/whl/cpu.
    (this is not true on macOS and Windows)
  • When using --index-url, pip will install the +cpu version, since that the index url
    is now the only search space for this pip invocation, +... is called the local version identifier.
  • We use --extra-index-url here so that we can install torch +cpu version while install other
    packages in the requirements.txt file in one pip invocation For torch and torchvision,
    we use packages from https://download.pytorch.org/whl/cpu, and for other packages, we use packages
    from PyPI directly.
  • Now pip can see PyTorch 2.8.0 and 2.8.0+cpu at the same time, but because the one with local version
    tag has higher priority, 2.8.0+cpu is installed.

From PEP440

Additionally a local version with a great number of segments will always compare as greater than a local version with fewer segments

Tests

Build the image and run benchmarking in the container.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

Copy link
Collaborator

@QiliangCui QiliangCui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid installing CUDA will be great!!

Can we do this change after #1245 so that we can have a cleaner base to diff?

@wdhongtw
Copy link
Collaborator Author

wdhongtw commented Dec 8, 2025

avoid installing CUDA will be great!!

Can we do this change after #1245 so that we can have a cleaner base to diff?

If I remember correctly, GitHub will update the diff in related PRs automatically if some commits in the PRs are already existed in target branch.

Let's just wait another PR to be merged first. :D

@wdhongtw
Copy link
Collaborator Author

avoid installing CUDA will be great!!

Can we do this change after #1245 so that we can have a cleaner base to diff?

Seems that my understanding was wrong when the repo enable linear-history, it results in conflict state.
I rebased again and the diff looks correct now.

@QiliangCui

@QiliangCui QiliangCui added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 12, 2025
@wdhongtw wdhongtw force-pushed the avoid-cuda branch 2 times, most recently from e1d7815 to 48f2560 Compare December 12, 2025 07:47
- Use PyTorch CPU version so we avoid installing CUDA.

Signed-off-by: Weida Hong <[email protected]>
@kyuyeunk
Copy link
Collaborator

Fixes #921

@wdhongtw
Copy link
Collaborator Author

wdhongtw commented Dec 13, 2025

No, this change does not fix #921, which requires the PyPI vllm-tpu to use torch==...+cpu version.

It's by design that there is and no way to propagate index url information through the standard Python package metadata *1, so even when we specify torch==...+cpu dependency in the metadata, user need to specify --extra-index-url when installing vllm-tpu so pip tool chain can find torch==...+cpu

And for existing version (before this PR), users can install vllm-tpu with CPU version torch by

uv pip install vllm-tpu --extra-index-url https://download.pytorch.org/whl/cpu --index-strategy unsafe-best-match

--index-strategy unsafe-best-match is required for some reason, not dig into it yet.

*1: I may be wrong about this claim, maybe we need other experienced engineer to validate my conclusion here.


To solve 921 completely, we probably need to push PyTorch to release something like torch-cpu on PyPI directly.

@kyuyeunk
Copy link
Collaborator

Ah got it. Thanks for clarifying it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants