refactor: improve handling when only masked_image_latents are provided in Flux.1 Fill dev #12293

Men1scus · 2025-09-05T22:04:22Z

No description provided.

yiyixuxu

thanks! I left a comment,

yiyixuxu · 2025-09-10T19:44:48Z

src/diffusers/pipelines/flux/pipeline_flux_fill.py

@@ -681,7 +681,6 @@ def prepare_latents(
        # latent height and width to be divisible by 2.
        height = 2 * (int(height) // (self.vae_scale_factor * 2))
        width = 2 * (int(width) // (self.vae_scale_factor * 2))
-        shape = (batch_size, num_channels_latents, height, width)


do you have an script on how to run this with masked_image_latents? it seems if image is None it won't work here

https://gist.github.com/Men1scus/9f4a9e139e0dfd8e662147760de2d7b1

why move this code?

My code changes are based on Sayak Paul’s gist.
Because of the RTX 3090’s 24GB VRAM limitation, I preprocess the input image and mask into masked_image_latents before feeding them into the transformer.

yiyixuxu · 2025-09-12T18:36:49Z

src/diffusers/pipelines/flux/pipeline_flux_fill.py

@@ -924,7 +926,7 @@ def __call__(
        latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt)

        # 5. Prepare latent variables
-        num_channels_latents = self.vae.config.latent_channels
+        num_channels_latents = self.vae.config.latent_channels if init_image is not None else None


why setting it to None here?

When the latents parameter in self.prepare_latents is not None:

if latents is not None: return latents.to(device=device, dtype=dtype), latent_image_ids

it will return early, and num_channels_latents will not be used,
because during denoising my VAE has already been deleted, so I cannot get it through self.vae.config.latent_channels.

refactor: improve handling when only masked_image_latents are provided

251dbea

yiyixuxu reviewed Sep 10, 2025

View reviewed changes

yiyixuxu reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: improve handling when only masked_image_latents are provided in Flux.1 Fill dev #12293

refactor: improve handling when only masked_image_latents are provided in Flux.1 Fill dev #12293

Men1scus commented Sep 5, 2025

Uh oh!

yiyixuxu left a comment

Uh oh!

yiyixuxu Sep 10, 2025

Uh oh!

Men1scus Sep 11, 2025

Uh oh!

yiyixuxu Sep 12, 2025

Uh oh!

Men1scus Sep 13, 2025

Uh oh!

yiyixuxu Sep 12, 2025

Uh oh!

Men1scus Sep 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

refactor: improve handling when only masked_image_latents are provided in Flux.1 Fill dev #12293

Are you sure you want to change the base?

refactor: improve handling when only masked_image_latents are provided in Flux.1 Fill dev #12293

Conversation

Men1scus commented Sep 5, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Men1scus Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Men1scus Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Men1scus Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Men1scus Sep 13, 2025 •

edited

Loading