Skip to content

[modular] add Modular flux for text-to-image #11995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jul 26, 2025

What does this PR do?

Plan to add the other tasks in a follow-up! I hope that's okay. Code to test this PR:

Unfold
import torch
from diffusers.modular_pipelines import SequentialPipelineBlocks
from diffusers.modular_pipelines.flux.modular_blocks import TEXT2IMAGE_BLOCKS
from diffusers.utils.logging import set_verbosity_debug

set_verbosity_debug()

model_id = "black-forest-labs/FLUX.1-dev"

blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)

pipeline = blocks.init_pipeline()
pipeline.load_components(["text_encoder"], repo=model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer"], repo=model_id, subfolder="tokenizer")
pipeline.load_components(["text_encoder_2"], repo=model_id, subfolder="text_encoder_2", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer_2"], repo=model_id, subfolder="tokenizer_2")
pipeline.load_components(["scheduler"], repo=model_id, subfolder="scheduler")
pipeline.load_components(["transformer"], repo=model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipeline.load_components(["vae"], repo=model_id, subfolder="vae", torch_dtype=torch.bfloat16)
pipeline.to("cuda")


prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."
output = pipeline(
    prompt=prompt, num_inference_steps=28, guidance_scale=3.5, generator=torch.manual_seed(0)
)
output.get_intermediate("images")[0].save("modular_flux.png")

Output:

image

Also, I have decided to not implement any guidance in this PR as the original Flux pipeline doesn't have any guidance. LMK if that is okay.

@@ -11,12 +11,14 @@
@dataclass
class FluxPipelineOutput(BaseOutput):
"""
Output class for Stable Diffusion pipelines.
Output class for Flux image generation pipelines.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this change is okay.

return mu


def _pack_latents(latents, batch_size, num_channels_latents, height, width):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't use "Copied from ..." here because:

make fix-copies enforces a weird indentation for this, which is errored out by the repo consistency check.

So, say you have the following as a standalone function in a module:

# Copied from diffusers.pipelines.flux.pipeline_flux.FluxPipeline._pack_latents
def _pack_latents(latents, batch_size, num_channels_latents, height, width):
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
    latents = latents.permute(0, 2, 4, 1, 3, 5)
    latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)

    return latents

The moment you run make fix-copies after this, you will have the following diff:

+# Copied from diffusers.pipelines.flux.pipeline_flux.FluxPipeline._pack_latents
 def _pack_latents(latents, batch_size, num_channels_latents, height, width):
-    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
+        latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
+        latents = latents.permute(0, 2, 4, 1, 3, 5)
+        latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
+
+        return latents
     latents = latents.permute(0, 2, 4, 1, 3, 5)
     latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)

One can notice the messed up indentation. We should fix in a separate PR. Cc: @DN6

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants