[modular] add Modular flux for text-to-image #11995

sayakpaul · 2025-07-26T10:09:50Z

What does this PR do?

Plan to add the other tasks in a follow-up! I hope that's okay. Code to test this PR:

Unfold

import torch
from diffusers.modular_pipelines import SequentialPipelineBlocks
from diffusers.modular_pipelines.flux.modular_blocks import TEXT2IMAGE_BLOCKS
from diffusers.utils.logging import set_verbosity_debug

set_verbosity_debug()

model_id = "black-forest-labs/FLUX.1-dev"

blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)

pipeline = blocks.init_pipeline()
pipeline.load_components(["text_encoder"], repo=model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer"], repo=model_id, subfolder="tokenizer")
pipeline.load_components(["text_encoder_2"], repo=model_id, subfolder="text_encoder_2", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer_2"], repo=model_id, subfolder="tokenizer_2")
pipeline.load_components(["scheduler"], repo=model_id, subfolder="scheduler")
pipeline.load_components(["transformer"], repo=model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipeline.load_components(["vae"], repo=model_id, subfolder="vae", torch_dtype=torch.bfloat16)
pipeline.to("cuda")


prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."
output = pipeline(
    prompt=prompt, num_inference_steps=28, guidance_scale=3.5, generator=torch.manual_seed(0)
)
output.get_intermediate("images")[0].save("modular_flux.png")

Output:

Also, I have decided to not implement any guidance in this PR as the original Flux pipeline doesn't have any guidance. LMK if that is okay.

sayakpaul · 2025-07-26T10:11:33Z

src/diffusers/pipelines/flux/pipeline_output.py

@@ -11,12 +11,14 @@
 @dataclass
 class FluxPipelineOutput(BaseOutput):
    """
-    Output class for Stable Diffusion pipelines.
+    Output class for Flux image generation pipelines.


Hope this change is okay.

sayakpaul · 2025-07-26T10:19:38Z

src/diffusers/modular_pipelines/flux/before_denoise.py

+    return mu
+
+
+def _pack_latents(latents, batch_size, num_channels_latents, height, width):


Didn't use "Copied from ..." here because:

make fix-copies enforces a weird indentation for this, which is errored out by the repo consistency check.

So, say you have the following as a standalone function in a module:

# Copied from diffusers.pipelines.flux.pipeline_flux.FluxPipeline._pack_latents def _pack_latents(latents, batch_size, num_channels_latents, height, width): latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2) latents = latents.permute(0, 2, 4, 1, 3, 5) latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4) return latents

The moment you run make fix-copies after this, you will have the following diff:

+# Copied from diffusers.pipelines.flux.pipeline_flux.FluxPipeline._pack_latents def _pack_latents(latents, batch_size, num_channels_latents, height, width): - latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2) + latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2) + latents = latents.permute(0, 2, 4, 1, 3, 5) + latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4) + + return latents latents = latents.permute(0, 2, 4, 1, 3, 5) latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)

One can notice the messed up indentation. We should fix in a separate PR. Cc: @DN6

HuggingFaceDocBuilderDev · 2025-07-26T10:23:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul added 6 commits July 24, 2025 21:11

start flux.

cd71035

more

45465d4

up

9381dd6

up

0636e9d

up

0484e77

up

0496a69

sayakpaul requested a review from yiyixuxu July 26, 2025 10:09

sayakpaul added the modular-diffusers label Jul 26, 2025

Merge branch 'main' into modular-flux

22e8cb4

sayakpaul commented Jul 26, 2025

View reviewed changes

sayakpaul added 2 commits July 26, 2025 15:42

get back the deleted files.

3c278c0

up

ac89477

sayakpaul commented Jul 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[modular] add Modular flux for text-to-image #11995

[modular] add Modular flux for text-to-image #11995

Uh oh!

sayakpaul commented Jul 26, 2025 •

edited

Loading

Uh oh!

sayakpaul Jul 26, 2025

Uh oh!

sayakpaul Jul 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 26, 2025

Uh oh!

Uh oh!

		return mu


		def _pack_latents(latents, batch_size, num_channels_latents, height, width):

[modular] add Modular flux for text-to-image #11995

Are you sure you want to change the base?

[modular] add Modular flux for text-to-image #11995

Uh oh!

Conversation

sayakpaul commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sayakpaul Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 26, 2025

Uh oh!

Uh oh!

sayakpaul commented Jul 26, 2025 •

edited

Loading