Skip to content

Conversation

@stockeh
Copy link

@stockeh stockeh commented Apr 18, 2025

Initial attempt at the training loss defined in the paper (assumes constant decrement in $\eta(t)$ for mapping function).

@XinYu-Andy
Copy link

Initial attempt at the training loss defined in the paper (assumes constant decrement in η ( t ) for mapping function).

Hi, thanks for your effort! I am curious why you use ema weight for y_r? Have you conducted some experiments and found it worked better?

@stockeh
Copy link
Author

stockeh commented Apr 18, 2025

I am curious why you use ema weight for y_r?

@XinYu-Andy thank you! Using ema weights for y_r initially made more sense to me. But, I just tested with the model and the loss is decreasing considerably better so far, although still just as noisy (w/ global batch size of 800).

I updated the pr with this change for now!

@XinYu-Andy
Copy link

I am curious why you use ema weight for y_r?

@XinYu-Andy thank you! Using ema weights for y_r initially made more sense to me. But, I just tested with the model and the loss is decreasing considerably better so far, although still just as noisy (w/ global batch size of 800).

I updated the pr with this change for now!

Are you doing experiments on cifar10? I conducted the experiment for a few weeks but was still not able to reproduce the results reported in the paper...

@stockeh
Copy link
Author

stockeh commented Apr 18, 2025

Are you doing experiments on cifar10?

Yes, with the DDPM++ UNet, using all the same reported hyperparams. I just started today and haven't extensively experimented yet, but would like to see a stable loss before anything further.

@XinYu-Andy
Copy link

Are you doing experiments on cifar10?

Yes, with the DDPM++ UNet, using all the same reported hyperparams. I just started today and haven't extensively experimented yet, but would like to see a stable loss before anything further.

Sounds good!👍

@stockeh
Copy link
Author

stockeh commented Apr 22, 2025

The parameter ordering for ddim was notationally different than that in Algorithm 1. This was fixed, but now the loss starts very, very small (log-log scale loss attached).
Screenshot 2025-04-21 at 6 10 48 PM

@stockeh
Copy link
Author

stockeh commented May 22, 2025

@jiamings @alexzhou907 @karanganesan @manskx can we get an update on the training code for IMM?

I've had 5-10 different people/labs message me personally asking and stating that this work is not reproducible. This pr is an effort toward reproducing the work, but myself and the research community must be missing something.

@alexzhou907
Copy link
Contributor

Hi @stockeh. The plan is to release the code around ICML time, or slightly sooner than that. We'd appreciate the patience. For reproduction, there are a lot more implementation details in appendix in the latest version you can check out. Some important details mentioned include keeping precision in TF32 or FP16 and we discourage using BF16 due to closeness between r and t. Another typo I made initially in the paper is the kernel weighting should be 1 / |c_out| instead of 1/c_out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants