Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different generation with `Diffusers in I2V tasks #92

Open
Kaihui-Cheng opened this issue Jan 9, 2025 · 1 comment
Open

Different generation with `Diffusers in I2V tasks #92

Kaihui-Cheng opened this issue Jan 9, 2025 · 1 comment
Assignees

Comments

@Kaihui-Cheng
Copy link

Kaihui-Cheng commented Jan 9, 2025

Hello, I got the weird generation, when try I2V task with diffusers. I'm curious why noise isn't added in I2V tasks within the diffusers library, as it is in LTX-video. However, it seems that this might not be the primary cause of the problem.

https://github.com/Lightricks/LTX-Video/blob/caf9f0c4670b462b51c845abff7f5731ed138364/ltx_video/pipelines/pipeline_ltx_video.py#L1027C1-L1035C22

  • The above is the result from the inference.py, and the following is the result generated with diffuser.
  • Prompts: a person
img_to_vid_0_a-person_42_512x512x161_0.mp4
diffusers_512x512_a_person.mp4
  • parameters
import argparse
import torch
from diffusers import LTXVideoTransformer3DModel
from diffusers import LTXImageToVideoPipeline
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKLLTXVideo
from diffusers.utils import export_to_video, load_image, load_video


from moviepy import VideoFileClip, AudioFileClip
import numpy as np
from pathlib import Path
import os
import imageio
from einops import rearrange
from PIL import Image
import random

def seed_everething(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)

def generate_video(args):

    pipe = LTXImageToVideoPipeline.from_pretrained(args.ltx_model_path, torch_dtype=torch.bfloat16)
    pipe.to("cuda")

    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

    image = load_image(args.validation_image)
    prompt = "A person is talking."
    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
    generator = torch.Generator(
        device="cuda" if torch.cuda.is_available() else "cpu"
    ).manual_seed(42)

    video = pipe(
        image=image,
        prompt=prompt,
        guidance_scale=3,
        # stg_scale=1,
        generator=generator,
        callback_on_step_end=None,
        negative_prompt=negative_prompt,
        width=512,
        height=512,
        num_frames=49,
        num_inference_steps=50,
        decode_timestep=0.05,
        decode_noise_scale=0.025,

    ).frames[0]
    export_to_video(video, args.output_file, fps=24)
@Kaihui-Cheng
Copy link
Author

  • the input image
    ref

@Kaihui-Cheng Kaihui-Cheng changed the title Different generation with `Diffusers Different generation with `Diffusers in I2V tasks Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants