-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The pipeline method for QwenImagePipeline.encode_prompts is not padding correctly; it's padding by the longest sequence length in the batch, which leaves very very short embeds that are out of distribution for the models' training set.
The padding should remain at 1024 tokens even after the system prompt is dropped. The attention mask has to be expanded too.
Reproduction
Execute QwenImagePipeline.encode_prompts()
and check resulting shape.
Logs
prompt_embeds.shape=torch.Size([1, 1024, 3584]), prompt_embeds_mask.shape=torch.Size([1, 1024])
^ after fixing.
prompt_embeds.shape=torch.Size([1, 5, 3584]) prompt_embeds_mask.shape=torch.Size([1, 5])
^ before.
prompt was simply minecraft
this leads to extremely high loss at training time unless very-long prompts are used.
at inference time, it causes patch embed artifacts because the RoPE is not accustomed to these positions.
System Info
Latest git main.
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working