-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken video output with Wan 2.1 I2V pipeline + quantized transformer #11006
Comments
Replaced
with
This uses ~34 GB of VRAM, takes ~84 sec per step (~42 min in total), but the output quality does not seem to improve. 8-bit.mp4 |
Thanks for the detailed post and reproducible code @rolux. Quantization on video models does not work at times, so it could very well be that. I'll run the same code with torch.bfloat16 first to verify that it's not a problem with our implementation. If not, quantization could probably be resulting in the poorer quality |
Btw, if you have a decent amount of RAM and want to run the model in low VRAM without quantization, I would recommend you to try this: https://huggingface.co/docs/diffusers/main/en/optimization/memory#group-offloading. Combining this with a few other memory optimizations can help you run in 7-10 GB VRAM |
@a-r-r-o-w: Thanks – yes, I had seen this earlier today. Added a comment in #10999. TLDR: Didn't work for me, as of now. |
@rolux Did you somehow figure out how to run wan 2.1 i2v locally with Diffusers? |
using fp32 also doesn't help the issue unfortunately |
Describe the bug
Since there is no proper documentation yet, I'm not sure if there is a difference to other video pipelines that I'm unaware of – but with the code below, the video results are reproducibly broken.
There is a warning:
Expected types for image_encoder: (<class 'transformers.models.clip.modeling_clip.CLIPVisionModel'>,), got <class 'transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection'>.
which I assume I'm expected to ignore.
Init image:
Result:
test.mp4
Result with different seed:
423258632.0.mp4
Result with different prompt:
423258632.0.mp4
Reproduction
Logs
System Info
Who can help?
No response
The text was updated successfully, but these errors were encountered: