Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The module 'HunyuanVideoTransformer3DModel' has been loaded in bitsandbytes 8bit and moving it to cpu via .to() is not supported. Module is still on cuda:0 #10653

Open
nitinmukesh opened this issue Jan 26, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@nitinmukesh
Copy link

nitinmukesh commented Jan 26, 2025

Describe the bug

  1. enable_model_cpu_offload works with 4bit but not with 8bit. Is this the expected behavior or an issue?
    The module 'HunyuanVideoTransformer3DModel' has been loaded in bitsandbytes 8bit and moving it to cpu via .to() is not supported. Module is still on cuda:0

  2. device_map="balanced", also doesn't work with int8 (after commenting enable_model_cpu_offload ) [not tested for int4]
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Reproduction

from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers import BitsAndBytesConfig
from diffusers.utils import export_to_video
import torch 


quant_config = BitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = HunyuanVideoTransformer3DModel.from_pretrained(
    "hunyuanvideo-community/HunyuanVideo",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.bfloat16,
)

pipe = HunyuanVideoPipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanVideo",
    transformer=transformer_8bit,
    torch_dtype=torch.float16,
    # device_map="balanced",
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
prompt = "A cat walks on the grass, realistic style."
video = pipe(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
export_to_video(video, "cat.mp4", fps=15)

Logs

(venv) C:\aiOWN\diffuser_webui>python HunyuanVideo_8bit_4bit.py
import error: No module named 'triton'
Fetching 6 files: 100%|█████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 4/4 [00:01<00:00,  3.24it/s]
Loading pipeline components...: 100%|███████████████████████████████| 7/7 [00:04<00:00,  1.70it/s]
The module 'HunyuanVideoTransformer3DModel' has been loaded in `bitsandbytes` 8bit and moving it to cpu via `.to()` is not supported. Module is still on cuda:0.
Traceback (most recent call last):
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Windows-10-10.0.26100-SP0
  • Running on Google Colab?: No
  • Python version: 3.10.11
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.27.1
  • Transformers version: 4.48.1
  • Accelerate version: 1.4.0.dev0
  • PEFT version: not installed
  • Bitsandbytes version: 0.45.0
  • Safetensors version: 0.5.2
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

@nitinmukesh nitinmukesh added the bug Something isn't working label Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant