Skip to content

Conversation

Men1scus
Copy link
Contributor

@Men1scus Men1scus commented Sep 5, 2025

No description provided.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I left a comment,

@@ -681,7 +681,6 @@ def prepare_latents(
# latent height and width to be divisible by 2.
height = 2 * (int(height) // (self.vae_scale_factor * 2))
width = 2 * (int(width) // (self.vae_scale_factor * 2))
shape = (batch_size, num_channels_latents, height, width)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an script on how to run this with masked_image_latents? it seems if image is None it won't work here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My code changes are based on Sayak Paul’s gist.
Because of the RTX 3090’s 24GB VRAM limitation, I preprocess the input image and mask into masked_image_latents before feeding them into the transformer.

@@ -924,7 +926,7 @@ def __call__(
latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt)

# 5. Prepare latent variables
num_channels_latents = self.vae.config.latent_channels
num_channels_latents = self.vae.config.latent_channels if init_image is not None else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why setting it to None here?

Copy link
Contributor Author

@Men1scus Men1scus Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the latents parameter in self.prepare_latents is not None:

if latents is not None:
    return latents.to(device=device, dtype=dtype), latent_image_ids

it will return early, and num_channels_latents will not be used,
because during denoising my VAE has already been deleted, so I cannot get it through self.vae.config.latent_channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants