-
Notifications
You must be signed in to change notification settings - Fork 6.3k
refactor: improve handling when only masked_image_latents are provided in Flux.1 Fill dev #12293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I left a comment,
@@ -681,7 +681,6 @@ def prepare_latents( | |||
# latent height and width to be divisible by 2. | |||
height = 2 * (int(height) // (self.vae_scale_factor * 2)) | |||
width = 2 * (int(width) // (self.vae_scale_factor * 2)) | |||
shape = (batch_size, num_channels_latents, height, width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have an script on how to run this with masked_image_latents
? it seems if image
is None it won't work here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My code changes are based on Sayak Paul’s gist.
Because of the RTX 3090’s 24GB VRAM limitation, I preprocess the input image and mask into masked_image_latents before feeding them into the transformer.
@@ -924,7 +926,7 @@ def __call__( | |||
latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt) | |||
|
|||
# 5. Prepare latent variables | |||
num_channels_latents = self.vae.config.latent_channels | |||
num_channels_latents = self.vae.config.latent_channels if init_image is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why setting it to None here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the latents
parameter in self.prepare_latents
is not None:
if latents is not None:
return latents.to(device=device, dtype=dtype), latent_image_ids
it will return early, and num_channels_latents
will not be used,
because during denoising my VAE has already been deleted, so I cannot get it through self.vae.config.latent_channels
.
No description provided.