You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I create the pipe and use it to generate multiple image with same settings. During first inference it take 8 min, next 30 min. VRAM usage remains the same.
Tested on 8 GB + 8 GB
P.S. I have used AuraFlow, Sana, Hunyuan, LTX, Cog, and several other pipeline but didn't encounter this issue with any of them.
Reproduction
import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
from huggingface_hub import hf_hub_download
from transformers import T5EncoderModel
bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
quantization_config = DiffusersBitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)
transformer_4bit = FluxTransformer2DModel.from_pretrained(
bfl_repo,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
bfl_repo,
subfolder="text_encoder_2",
quantization_config=quantization_config,
torch_dtype=dtype
)
pipe = FluxPipeline.from_pretrained(
bfl_repo,
transformer=None,
text_encoder_2=None,
torch_dtype=dtype
)
pipe.transformer = transformer_4bit
pipe.text_encoder_2 = text_encoder_2
# https://civitai.com/models/1111989/majicflus-beauty
pipe.load_lora_weights(
"./models/lora/flux_dev/majicbeauty1.safetensors",
adapter_name="majicbeauty1"
)
pipe.set_adapters("majicbeauty1", adapter_weights=0.8)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()
prompt = "Photograph capturing a woman seated in a car, looking straight ahead. Her face is partially obscured, making her expression hard to read, adding an air of mystery. Natural light filters through the car window, casting subtle reflections and shadows on her face and the interior. The colors are muted yet realistic, with a slight grain that evokes a 1970s film quality. The scene feels intimate and contemplative, capturing a quiet, introspective moment, mj"
image = pipe(
prompt=prompt,
width=1072,
height=1920,
max_sequence_length=512,
num_inference_steps=40,
guidance_scale=50,
generator=torch.Generator().manual_seed(1349562290),
).images[0]
image.save("out_majicbeauty5.png")
torch.cuda.empty_cache()
image = pipe(
prompt=prompt,
width=1072,
height=1920,
max_sequence_length=512,
num_inference_steps=50,
guidance_scale=40,
generator=torch.Generator().manual_seed(1349562290),
).images[0]
image.save("out_majicbeauty6.png")
Logs
Fetching 3 files: 100%|█████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 440.05it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:27<00:00, 13.90s/it]
Loading pipeline components...: 0%|| 0/5 [00:00<?, ?it/s]You set`add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████| 5/5 [00:00<00:00, 5.12it/s]
Token indices sequence length is longer than the specified maximum sequence length forthis model (95 > 77). Running this sequence through the model will resultin indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['. the scene feels intimate and contemplative, capturing a quiet, introspective moment, mj']
100%|█████████████████████████████████████████████████████████████| 40/40 [08:10<00:00, 12.25s/it]
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['. the scene feels intimate and contemplative, capturing a quiet, introspective moment, mj']
4%|██▍ | 2/50 [01:52<43:27, 54.32s/it]
Removed Lora related code and still the same issue
100%|█████████████████████████████████████████████████████████████| 40/40 [06:24<00:00, 9.61s/it]
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['. the scene feels intimate and contemplative, capturing a quiet, introspective moment, mj']
18%|███████████▏ | 9/50 [06:00<27:16, 39.92s/it]
Describe the bug
So I create the pipe and use it to generate multiple image with same settings. During first inference it take 8 min, next 30 min. VRAM usage remains the same.
Tested on 8 GB + 8 GB
P.S. I have used AuraFlow, Sana, Hunyuan, LTX, Cog, and several other pipeline but didn't encounter this issue with any of them.
Reproduction
Logs
System Info
Who can help?
@yiyixuxu @DN6
The text was updated successfully, but these errors were encountered: