Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ControlNet union pipeline fails on multi-model #10656

Open
vladmandic opened this issue Jan 26, 2025 · 15 comments
Open

ControlNet union pipeline fails on multi-model #10656

vladmandic opened this issue Jan 26, 2025 · 15 comments
Labels
bug Something isn't working

Comments

@vladmandic
Copy link
Contributor

Describe the bug

All controlnet types are typically defined inside pipeline as below (example from StableDiffusionXLControlNetPipeline):

controlnet: Union[ControlNetModel, List[ControlNetModel], Tuple[ControlNetModel], MultiControlNetModel],

however, StableDiffusionXLControlNetUnionPipeline pipeline defines it simply as:

controlnet: ControlNetUnionModel

which defeats one of the main advantages of union controlnet - to be able to perform multiple guidances using same model.
for reference, controlnetunion was added via pr #10131
any changes to txt2img pipeline should also be mirrored in img2img and inpaint pipelines.

Reproduction

control1 = ControlNetUnionModel.from_single_file(...)
control2 = ControlNetUnionModel.from_single_file(...)
pipe = StableDiffusionXLControlNetUnionPipeline.from_single_file(..., control=[control1, control2])

Logs

│    256 │   │   if not isinstance(controlnet, ControlNetUnionModel):                                                                                                                                                                                                                                                                                                                                                         │
│ ❱  257 │   │   │   raise ValueError("Expected `controlnet` to be of type `ControlNetUnionModel`.")                                                                                                                                                                                                                                                                                                                          │
│    258                                                                                                                                                                                                                                                                                                                                                                                                                      │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Expected `controlnet` to be of type `ControlNetUnionModel`.

System Info

diffusers==0.33.0.dev0

Who can help?

@hlky @yiyixuxu @sayakpaul @DN6

@vladmandic vladmandic added the bug Something isn't working label Jan 26, 2025
@hlky
Copy link
Collaborator

hlky commented Jan 26, 2025

The advantage of controlnet union is that we don't need to use multiple controlnet models, no? We just use multiple control conditionings with the same controlnet union model. What is the use case here?

@vladmandic
Copy link
Contributor Author

lets say i want to run openpose and canny. how?
i'm guessing you're saying to specify control_mode as list instead of int and everything just works if i send two control_images as input list instead?

issue is that i need to match number of all params - control_image, controlnet_conditioning_scale, start, end, etc.
and having StableDiffusionXLControlNetUnionPipeline behave totally differently than any other controlnet pipeline in any other base model makes it a massive if/then piece of code when all controlnet pipelines should supposedly behave in uniform way.

@hlky
Copy link
Collaborator

hlky commented Jan 26, 2025

cc @yiyixuxu

@vladmandic
Copy link
Contributor Author

ok, using single ControlNetUnionModel in StableDiffusionXLControlNetUnionPipeline
and passing two inputs as list for each arg FAILS.

'control_mode': [3, 0], # canny and openpose
'controlnet_conditioning_scale': [0.5, 0.8], # different strength for each conditioning model
'control_guidance_start': [0.1, 0.2], # explicit start
'control_guidance_end': [0.8, 0.9], # explicit stop
'control_image': [<PIL.Image.Image image mode=RGB size=768x768 at 0x773051C06DE0>, <PIL.Image.Image image mode=RGB size=768x768 at 0x773051F37710>], # preprocessed inputs with canny and openpose processors

TypeError: For single controlnet: controlnet_conditioning_scale must be type float.

@hlky so intended behavior as you noted seems to be broken?

@hlky
Copy link
Collaborator

hlky commented Jan 27, 2025

@vladmandic Apologies for the oversight, controlnet_conditioning_scale with multiple inputs was not tested. PR to fix that issue: #10666

@yiyixuxu
Copy link
Collaborator

fixed in #10666 :)

@vladmandic
Copy link
Contributor Author

can we wait with closing the issue until its confirmed - either internally or externally?
this needs to be end-to-end for all args, now it fails with:

float(i / len(timesteps) < control_guidance_start or (i + 1) / len(timesteps) > control_guidance_end)
TypeError: '<' not supported between instances of 'float' and 'list'

@yiyixuxu
Copy link
Collaborator

oh sorry, reopen it

@yiyixuxu yiyixuxu reopened this Jan 27, 2025
@hlky
Copy link
Collaborator

hlky commented Jan 27, 2025

We can only apply a single scale here

if guess_mode and not self.config.global_pool_conditions:
scales = torch.logspace(-1, 0, len(down_block_res_samples) + 1, device=sample.device) # 0.1 to 1.0
scales = scales * conditioning_scale
down_block_res_samples = [sample * scale for sample, scale in zip(down_block_res_samples, scales)]
mid_block_res_sample = mid_block_res_sample * scales[-1] # last one
else:
down_block_res_samples = [sample * conditioning_scale for sample in down_block_res_samples]
mid_block_res_sample = mid_block_res_sample * conditioning_scale

In MultiControlNet we wrap the forward to handle list of scales

def forward(
self,
sample: torch.Tensor,
timestep: Union[torch.Tensor, float, int],
encoder_hidden_states: torch.Tensor,
controlnet_cond: List[torch.tensor],
conditioning_scale: List[float],
class_labels: Optional[torch.Tensor] = None,
timestep_cond: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
cross_attention_kwargs: Optional[Dict[str, Any]] = None,
guess_mode: bool = False,
return_dict: bool = True,
) -> Union[ControlNetOutput, Tuple]:
for i, (image, scale, controlnet) in enumerate(zip(controlnet_cond, conditioning_scale, self.nets)):
down_samples, mid_sample = controlnet(
sample=sample,
timestep=timestep,
encoder_hidden_states=encoder_hidden_states,
controlnet_cond=image,
conditioning_scale=scale,

The point of ControlNet Union is reducing the inference cost of multiple controlnets but it seems that creates a limitation that we can't apply scale for each control type. For some experiments we could try applying scale at some another point in the code.

for cond, control_idx in zip(controlnet_cond, control_type_idx):
condition = self.controlnet_cond_embedding(cond)
feat_seq = torch.mean(condition, dim=(2, 3))
feat_seq = feat_seq + self.task_embedding[control_idx]
inputs.append(feat_seq.unsqueeze(1))
condition_list.append(condition)
condition = sample
feat_seq = torch.mean(condition, dim=(2, 3))
inputs.append(feat_seq.unsqueeze(1))
condition_list.append(condition)
x = torch.cat(inputs, dim=1)
for layer in self.transformer_layes:
x = layer(x)
controlnet_cond_fuser = sample * 0.0
for idx, condition in enumerate(condition_list[:-1]):
alpha = self.spatial_ch_projs(x[:, idx])
alpha = alpha.unsqueeze(-1).unsqueeze(-1)
controlnet_cond_fuser += condition + alpha
sample = sample + controlnet_cond_fuser

@vladmandic
Copy link
Contributor Author

example you've posted is about conditioning_scale.
what about control_guidance_start and control_guidance_end?
if i can control those two, i can get close-to having scale.

anyhow my primary goal here is uniform interface.
secondary is adding functionality over time.
i'm ok with scale being a single value due to model limitations, but i'm not ok with pipeline throwing a random runtime errors because i have to guess what is model expecting and then i'd have to add special if/then code just to deal with special cases.
at very least allow processing using first value in list and log a warning - that would be normal behavior.

@hlky
Copy link
Collaborator

hlky commented Jan 28, 2025

control_guidance_start, control_guidance_end and controlnet_conditioning_scale are linked, they all control the final cond_scale value.

image = pipe(
    prompt,
    control_image=[controlnet_img, controlnet_img],
    control_mode=[3, 3],
    controlnet_conditioning_scale=[0.5, 0.5],
    control_guidance_start=[0.1, 0.2],
    control_guidance_end=[0.8, 0.9],
    height=1024,
    width=1024,
).images[0]

# align format for control guidance
if not isinstance(control_guidance_start, list) and isinstance(control_guidance_end, list):
control_guidance_start = len(control_guidance_end) * [control_guidance_start]
elif not isinstance(control_guidance_end, list) and isinstance(control_guidance_start, list):
control_guidance_end = len(control_guidance_start) * [control_guidance_end]

# control_guidance_start control_guidance_end
[0.1, 0.2] [0.8, 0.9]

controlnet_keep = []
for i in range(len(timesteps)):
controlnet_keep.append(
1.0
- float(i / len(timesteps) < control_guidance_start or (i + 1) / len(timesteps) > control_guidance_end)
)

should be (with controlnet_keep.append(keeps) not keeps[0])

controlnet_keep = []
for i in range(len(timesteps)):
keeps = [
1.0 - float(i / len(timesteps) < s or (i + 1) / len(timesteps) > e)
for s, e in zip(control_guidance_start, control_guidance_end)
]
controlnet_keep.append(keeps[0] if isinstance(controlnet, ControlNetModel) else keeps)

[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [1.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]

if isinstance(controlnet_keep[i], list):
cond_scale = [c * s for c, s in zip(controlnet_conditioning_scale, controlnet_keep[i])]

[0.0, 0.0]

As above we can experiment with applying scale in a different place to get the expected effect. For now the limitation is single value for control_guidance_start, control_guidance_end and controlnet_conditioning_scale. If you'd like we can handle that by taking the first value from the list and logging a warning as you suggested.

@vladmandic
Copy link
Contributor Author

vladmandic commented Jan 28, 2025

If you'd like we can handle that by taking the first value from the list and logging a warning as you suggested.

yes, anything to reduce runtime errors - library should be able to handle its own limitations (and there always are some limitations) without crashing.

but if we cannot support multiple scale/start/end nicely inside single controlnetunion, perhaps we really should also allow for multiple models to run inside pipeline - just like any other controlnet?

@asomoza
Copy link
Member

asomoza commented Jan 28, 2025

Hi, controlnet union AFAIK doesn't allow anywhere (here or in any other ui/library) to be controlled with start, end or scale for each condition image, it is always for the whole controlnet, the only advantage of it is that you can use multiple condition images for one controlnet, nothing more.

We should make it to be able to be used with other controlnets if people want, or for example, to use a different start,end or scale, people should pass it as another instance which would solve that problem.

Of course, if we can control the start, end and scale of each condition image, it would be ideal but that is a feature request and not an issue.

@vladmandic
Copy link
Contributor Author

@asomoza that's pretty much what i wrote?

  • to allow controlnet union pipeline to use multi-controlnet as input, not just single controlnet
  • and additional ask is that pipeline has uniform input args with all other pipelines. so for example, if its normal that pipeline takes list[scale] as input, it should log warning on that instead of crashing with runtime error.

@asomoza
Copy link
Member

asomoza commented Jan 28, 2025

yeah, I was separating the part of the controlling each control type from the issue for future reference and to make it clear.

Your list is what this issue should be about and what should be fixed, just take into consideration that this is a very special controlnet that differs a lot from the others, that why we're having this issues and your feedback helps us a lot with it, I almost never use it with other controlnets so it also passed under my radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants