Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to choose the noise parameters in sampling? #1

Open
Xiaohui9607 opened this issue May 1, 2024 · 1 comment
Open

how to choose the noise parameters in sampling? #1

Xiaohui9607 opened this issue May 1, 2024 · 1 comment

Comments

@Xiaohui9607
Copy link

Hi Campbell, I saw in the notebook demo and the sampling code, there is a hyperparameter noise. In sampling.py it is set to 0.0, and it's 1.0 in the uniform demo and 10.0 in the mask demo. Is there any principle to choose this parameter? thanks

@andrew-cr
Copy link
Owner

So the noise parameter will control how much 'mixing' happens during sampling e.g. with the masking process how much it flips back and forth between mask and unmask. In the masking case, if we integrate with time step dt and have D dimensions, we have that noise * dt * D is the average number of dimensions that get set back to mask in each integration step. We don't want this to be too big a proporition of all dimensions otherwise the process could become degenerate with everything getting set back to mask all the time. So maybe as a rule of thumb we wouldn't want more than say 10% of the dimensions to get switched back each integration step noise * dt * D < 0.1 D => noise < 0.1/dt so for dt = 0.001 we would have noise < 100. From this we also see that the higher you make dt, the smaller you will want to set noise (noise = 0 is likely the easiest to simulate with highest dt).

Other than this very rough upper bound, at the end of the day it should be set so that you observe best sampling performance on the task you are interested in. In theory, any value of noise will achieve the desired marginals and result in a sample from the data distribution. In practice however, our denoising distribution is only approximate and we introduce discretization error during simulation as well. In this case, we are forced to empirically choose the noise that works best in our real world approximate setting.

One final thing to note is that it is also possible to have the noise hyperparameter depend on the time variable. E.g. you can have more noise during the final parts of simulation and less noise nearer the beginning. In my previous work https://arxiv.org/pdf/2205.14987 top of page 39, we found it is most beneficial to introduce noise near the end of sampling. Though I haven't systematically investigated this for the discrete flow models case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants