-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add SkyReels V2: Infinite-Length Film Generative Model #11518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's about time. Thanks. |
Mid-PR questions:
|
@tolgacangoz Thanks for working on this, really cool work so far!
2 and 3. I think in this case, we should have separate implementation of SkyReelsV2 and Wan due to the autoregressive nature of the former. Adding any extra code in Wan might complicate it for readers. Will let @yiyixuxu comment on this though
|
FWIW, I have been successful in using the same T5 encoder for WAN 2.1 for this model just by fiddling with their pipeline:
Then this: I incorporate my bitsandbytes nf4 transformer, their tokenizer and the WAN based T5 encoder:
I need to add this function to the pipeline for the T5 encoder to work:
|
It seems appropriate to me. Only Diffusion Forcing pipelines are different for large models. How are the results with your setting? |
Hi @yiyixuxu @a-r-r-o-w and SkyReels Team @yjp999 @pftq @Langdx @guibinchen ... This PR will be ready for review for |
…ensure consistency and correct functionality.
…sV2TimeTextImageEmbedding`.
…itialization to directly assign the list of SkyReelsV2 components.
…ys convert query, key, and value to `torch.bfloat16`, simplifying the code and improving clarity.
…by adding VAE initialization and detailed prompt for video generation, improving clarity and usability of the documentation.
…and improve formatting in `pipeline_skyreels_v2_diffusion_forcing.py` to enhance code readability and maintainability.
…ine` from 5.0 to 6.0 to enhance video generation quality.
…definition of `SkyReelsV2DiffusionForcingPipeline` to ensure consistency and improve video generation quality.
…peline` to default to `None`.
…odel` to *ensure* correct tensor operations.
…peat_interleave` for improved efficiency in `SkyReelsV2Transformer3DModel`.
… with guidance scale and shift parameters for T2V and I2V. Remove unused `retrieve_latents` function to streamline the code.
…line` to use `deepcopy` for improved state management during inference steps.
…initialization across SkyReels test files
The `generator` parameter is not used by the scheduler's `step` method within the SkyReelsV2 diffusion forcing pipelines. This change removes the unnecessary argument from the method call for code clarity and consistency.
…'s dtype in SkyReelsV2TimeTextImageEmbedding
Replaces manual parameter iteration with the `get_parameter_dtype` helper.
Adds a check to ensure the `_keep_in_fp32_modules` attribute exists on a parameter before it is accessed. This prevents a potential `AttributeError`, making the utility function more robust when used with models that do not define this attribute.
This will be my 3. pipeline contribution, yay 🥳! |
@@ -168,6 +168,8 @@ class UniPCMultistepScheduler(SchedulerMixin, ConfigMixin): | |||
use_beta_sigmas (`bool`, *optional*, defaults to `False`): | |||
Whether to use beta sigmas for step sizes in the noise schedule during the sampling process. Refer to [Beta | |||
Sampling is All You Need](https://huggingface.co/papers/2407.12173) for more information. | |||
use_flow_sigmas (`bool`, *optional*, defaults to `False`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tolgacangoz ohh this cannot be the only change in scheduler, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh it's already in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the output quality match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outputs are qualitatively/visibly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the amazing work here @tolgacangoz! I think the only blocker is having the weights merged into the official repos, yes?
Right. |
hi @tolgacangoz can you send PR into the official repo for the weights, I think they have created place holder for all the checkpoints, e.g. https://huggingface.co/Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers |
I thought they were supposed to do this by examining/verifying the conversion script, etc., since we talk about the official repository. |
They say try with 14B models for FLF2V, thus this issue (?) is irrelevant from this PR, IMO. |
@tolgacangoz
|
If you examine the first comment of this PR, you can see that I wasn't able to produce good results for FLF2V with the DF 1.3B model by using the original code. This is their answer: SkyworkAI/SkyReels-V2#93 |
ohh sounds good, thanks for explaining! |
thanks @tolgacangoz |
Thanks for merging and for the opportunity to contribute! I'll be monitoring the original repository for updates... |
thanks a lot @tolgacangoz, really awesome contribution! |
Thank you, @tolgacangoz |
Thanks for the opportunity to fix #11374!
Original Work
Original repo: https://github.com/SkyworkAI/SkyReels-V2
Paper: https://huggingface.co/papers/2504.13074
TODOs:
✅
SkyReelsV2Transformer3DModel
: 90%WanTransformer3DModel
✅
SkyReelsV2DiffusionForcingPipeline
✅
SkyReelsV2DiffusionForcingImageToVideoPipeline
: Includes FLF2V.✅
SkyReelsV2DiffusionForcingVideoToVideoPipeline
: Extends a given video.✅
SkyReelsV2Pipeline
✅
SkyReelsV2ImageToVideoPipeline
: Includes FLF2V.✅
scripts/convert_skyreelsv2_to_diffusers.py
tolgacangoz/SkyReels-V2-Diffusers
✅ Did you make sure to update the documentation with your changes? Did you write any new necessary tests?: We will construct these during review.
T2V with Diffusion Forcing (OLD)
diffusers
integrationoriginal_0_short.mp4
diffusers_0_short.mp4
diffusers
integrationoriginal_37_short.mp4
diffusers_37_short.mp4
diffusers
integrationoriginal_0_long.mp4
diffusers_0_long.mp4
diffusers
integrationoriginal_37_long.mp4
diffusers_37_long.mp4
I2V with Diffusion Forcing (OLD)
prompt
="A penguin dances."diffusers
integrationi2v-short.mp4
FLF2V with Diffusion Forcing (OLD)
Now, Houston, we have a problem.
I have been unable to produce good results with this task. I tried many hyperparameter combinations with the original code.
The first frame's latent (
torch.Size([1, 16, 1, 68, 120])
) is overwritten onto the first of25
frame latents oflatents
(torch.Size([1, 16, 25, 68, 120])). Then, the last frame's latent is concatenated, thuslatents
istorch.Size([1, 16, 26, 68, 120])
. After the denoising process, the length of the last frame latent is discarded at the end and then decoded by the VAE. I tried not concatenating the last frame but overwriting onto the latest frame oflatents
and not discarding the latest frame latent at the end, but still got bad results. Here are some results:0.mp4
1.mp4
2.mp4
3.mp4
4.mp4
5.mp4
6.mp4
7.mp4
V2V with Diffusion Forcing (OLD)
This pipeline extends a given video.
diffusers
integrationvideo1.mp4
v2v.mp4
Firstly, I want to congratulate you on this great work, and thanks for open-sourcing it, SkyReels Team! This PR proposes an integration of your model.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@yiyixuxu @a-r-r-o-w @linoytsaban @yjp999 @Howe2018 @RoseRollZhu @pftq @Langdx @guibinchen @qiudi0127 @nitinmukesh @tin2tin @ukaprch @okaris