Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding AutoencoderKL model returns option request #10614

Open
2 tasks done
lin-tianyu opened this issue Jan 21, 2025 · 0 comments
Open
2 tasks done

Adding AutoencoderKL model returns option request #10614

lin-tianyu opened this issue Jan 21, 2025 · 0 comments

Comments

@lin-tianyu
Copy link

Model/Pipeline/Scheduler description

Environment

Using diffusers==0.32.2 and Pytorch 2.5.1

Context

I am developing an AutoencoderKL fine-tuning script and I have finished the single-GPU training part, but the current feature of the AutoencoderKL model makes it almost impossible for distributed training.

Specifically, the fine-tuning process being implemented contains the reconstruction loss and the kl_loss. And the kl_loss requires the following calls to happen:

posterior = model.encode(pixel_values).latent_dist   
latents = posterior.sample()    
reconstructed = model.decode(latents).sample    

kl_loss = posterior.kl().mean() # get kl_loss

Two failure cases of implementing distributed learning

However, in distributed learning settings, the AutoencoderKL model here is wrapped up, causing the following 2 implementations of distributed learning impractical. Both of the following implementations can work, but they are extremely SLOW.

Accessing encode and decode using .module call

posterior = model.module.encode(pixel_values).latent_dist   # call `.module` of the wrapped model, but this takes time
latents = posterior.sample()    
reconstructed = model.module.decode(latents).sample    # call `.module` of the wrapped model, but this takes time

Unwrap the model before training loop

unwrapped_model = accelerator.unwrap_model(model)
for i, batch in enumerate(train_dataloader):
    # do something
    posterior = unwrapped_model.encode(pixel_values).latent_dist   # use the unwrapped model to access
    latents = posterior.sample()    
    reconstructed = unwrapped_model.decode(latents).sample    # use the unwrapped model to access
    # do the rest

Request

As a result, I am hoping to ask for an option to return the intermediate values of the AutoencoderKL model, such as the posterior that can be used for fine-tuning the model. Or any other way that can make this happen would work.

Thanks!

Open source status

  • The model implementation is available.
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

The fine-tuning code is NOT available yet but will be available in the near future.

The implementation of the core problem is provided above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant