-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When the token length is greater than 1024, it will be truncated to 1024. However, the length of RoPE is fixed at 1024 because the image takes up 32 len (for a 1024 width and height image). This causes the length of txt_freqs to be less than 1024. Therefore, x_rotated * freqs_cis will generate an error due to dimension mismatch.
max_len = max(txt_seq_lens)
txt_freqs = self.pos_freqs[max_vid_index : max_vid_index + max_len, ...]
x_out = torch.view_as_real(x_rotated * freqs_cis).flatten(3)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (983) at non-singleton dimension 2
Reproduction
Logs
System Info
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working