-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug inside value_guided_sampling.py #10636
Comments
There were 2 more issues we realized in the value_guided_sampling code:
How value function scaling is computedIn value_guided_sampling the posterior_variance is computed as (line 103): posterior_variance = self.scheduler._get_variance(i)
model_std = torch.exp(0.5 * posterior_variance)
grad = model_std * grad But in original repository instead of std, variance is used. (See this commit) The change to fix this: if self.scheduler.variance_type == "fixed_small_log": # returns std
posterior_std = self.scheduler._get_variance(i)
posterior_log_std = torch.log(posterior_std)
posterior_var = torch.exp(posterior_log_std * 2)
# print("post var", posterior_var)
elif self.scheduler.variance_type == "fixed_small": # returns var
posterior_var = self.scheduler._get_variance(i)
else:
raise NotImplementedError Value used for trajectory orderingIn the code, at each value guidance step, value of the current denoised trajectory is computed and used to guide sampling. To fix this instead of returning y in # run the diffusion process
x = self.run_diffusion(x, conditions, n_guide_steps, scale)
# create batch of 0th timestep for value estimation
timesteps = torch.full((batch_size,), 0, device=self.unet.device, dtype=torch.long)
y = self.value_function(x.permute(0, 2, 1), timesteps).sample Note: The first issue can be mitigated by adjusting the |
Tagging others who were involved in finding these bugs @FaisalAhmed0 @daniellawson9999 |
Describe the bug
There's a bug here:
diffusers/src/diffusers/experimental/rl/value_guided_sampling.py
Lines 57 to 67 in 37c9697
The means and stds should be computed across each of the individual dimensions in the
observations
,actions
space as its done in the original jannerm/diffuser code. This is also made clear by the final video in the reinforcement_learning_with_diffusers.ipynb colab notebook shared here for reference when comparing to a rollout video provided by jannerm/diffuser (second video).buggy.mp4
jannerm_rollout.mp4
Proposed fix:
Reproduction
Run the google colab reinforcement_learning_with_diffusers.ipynb
Logs
System Info
NVIDIA TITAN RTX, 24576 MiB
Who can help?
@yiyixuxu @DN6
The text was updated successfully, but these errors were encountered: