-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
After completing 50 steps of progress, the video memory usage skyrocketed from around 8GB to 26GB, resulting in very slow performance
Reproduction
import torch
import numpy as np
from diffusers import WanImageToVideoPipeline, AutoencoderKLWan, ModularPipeline
from diffusers.utils import export_to_video
model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
dtype = torch.bfloat16
device = "cuda:2"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype)
pipe.enable_model_cpu_offload(device=device)
# use default wan image processor to resize and crop the image
image_processor = ModularPipeline.from_pretrained("YiYiXu/WanImageProcessor", trust_remote_code=True)
image = image_processor(
image="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG",
max_area=1280*704, output="processed_image")
height, width = image.height, image.width
print(f"height: {height}, width: {width}")
num_frames = 121
num_inference_steps = 50
guidance_scale = 5.0
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
output = pipe(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
height=height,
width=width,
num_frames=num_frames,
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "yiyi_test_6_ti2v_5b_output.mp4", fps=24)
Logs
System Info
0.35.0 dev
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working