[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size #1704

trmanish · 2025-07-10T00:01:31Z

What this PR does

Appends --compress-mode=size to CMAKE_CUDA_FLAGS for nvcc ≥ 12.4.
No runtime or API changes.

Impact

Wheel shrinks 69 MB → 45 MB (≈ 35 %).

Compatibility

nvcc < 12.4 builds unchanged—the flag is gated by a version check.
Decompression adds only a few hundred ms on first import bitsandbytes.

Signed-off-by: manish <[email protected]>

matthewdouglas · 2025-07-10T16:43:06Z

Thanks, appreciate the suggestion! I have the same concern mentioned over on PyTorch regarding support for users with older drivers: pytorch/pytorch#157791 (comment)

Mainly it seems that this would require cu124+ users to have the 550+ driver, while currently we should still have compatibility for driver version 525+.

So will have to weigh that in as a consideration.

trmanish · 2025-07-10T20:10:12Z

I believe an earlier comment from original PR did say it won't have that as a requirement

pytorch/pytorch#157791 (comment)

But I believe latest from Pytorch is as below:

pytorch/pytorch#157791 (comment)

However my understanding is(pls correct me if wrong) that the only variant that would be built with --compress-mode=size is the cu124 wheel, and that wheel already implies a 550-series driver. Users on 525/535 stay on the cu122 / cu121 wheels, which this PR leaves untouched.

Options
Merge as-is – compression only affects the cu124 wheel, no compatibility regression for existing users.

Opt-in flag – guard it behind ENABLE_BNB_CUDA_COMPRESSION=1; default off.

Dual wheels – publish both bitsandbytes-cu124.whl and -cu124-slim.whl.

matthewdouglas · 2025-08-14T16:02:50Z

Hi,

We appreciate the effort and explanation, but unfortunately we cannot merge this. The assumption that only applying to cu124+ builds limits the scope is flawed, since cu124+ builds can still be utilized with the older driver versions thanks to CUDA's Minor Version Compatibility. This means that the 12.4, 12.6, 12.8, and 12.9 builds can currently run on systems with driver v525.

We have sufficient evidence that this is a valid usage scenario. For example, vLLM received a similar PR and had to revert it for this reason: vllm-project/vllm#20853. See also: pytorch/pytorch#157791 (comment)

The three most recent minor PyTorch releases use cu124+ builds by default, and it's supported on systems with drivers v525+.

Publishing additional wheels adds extra complexity that we do not wish to take on.

With that said, we will use this option when we start producing builds for CUDA 13, which by default will start to use the "balanced" compressed mode, and can provide guarantees that all users can support the "size" mode as well.

Additionally, we will explorer further ways to limit our binary sizes:

Dropping support for older GPUs. In particular, Maxwell and Pascal could be dropped. Right now we do not support these in the CUDA 12.8/12.9 builds, but we're open to dropping support entirely.
Removing binaries for CUDA 12.0, 12.2, 12.3, 12.5, as PyTorch is not typically built with these versions. We would instead take advantage of compatibility and only load one of CUDA 11.8, 12.1, 12.4, 12.6, 12.8, and 12.9.
Eventually, we'll follow PyTorch's lead and drop the CUDA 11.8 build as well.

[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size

a0676b6

Signed-off-by: manish <[email protected]>

matthewdouglas closed this Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size #1704

[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size #1704

Uh oh!

trmanish commented Jul 10, 2025 •

edited

Loading

Uh oh!

matthewdouglas commented Jul 10, 2025

Uh oh!

trmanish commented Jul 10, 2025

Uh oh!

matthewdouglas commented Aug 14, 2025

Uh oh!

Uh oh!

Uh oh!

[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size #1704

[Packaging] Shrink wheel ~35 % via nvcc --compress-mode=size #1704

Uh oh!

Conversation

trmanish commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Impact

Compatibility

Uh oh!

matthewdouglas commented Jul 10, 2025

Uh oh!

trmanish commented Jul 10, 2025

Uh oh!

matthewdouglas commented Aug 14, 2025

Uh oh!

Uh oh!

trmanish commented Jul 10, 2025 •

edited

Loading