Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug inside value_guided_sampling.py #10636

Open
rdesc opened this issue Jan 23, 2025 · 3 comments
Open

Bug inside value_guided_sampling.py #10636

rdesc opened this issue Jan 23, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@rdesc
Copy link

rdesc commented Jan 23, 2025

Describe the bug

There's a bug here:

for key in self.data.keys():
try:
self.means[key] = self.data[key].mean()
except: # noqa: E722
pass
self.stds = {}
for key in self.data.keys():
try:
self.stds[key] = self.data[key].std()
except: # noqa: E722
pass

The means and stds should be computed across each of the individual dimensions in the observations, actions space as its done in the original jannerm/diffuser code. This is also made clear by the final video in the reinforcement_learning_with_diffusers.ipynb colab notebook shared here for reference when comparing to a rollout video provided by jannerm/diffuser (second video).

buggy.mp4
jannerm_rollout.mp4

Proposed fix:

        for key in self.data.keys():
            try:
                if key in ['observations', 'actions']:
                    self.means[key] = self.data[key].mean(axis=0)
                else:
                    self.means[key] = self.data[key].mean()
            except:  # noqa: E722
                pass
        self.stds = {}
        for key in self.data.keys():
            try:
                if key in ['observations', 'actions']:
                    self.stds[key] = self.data[key].std(axis=0)
                else:
                    self.stds[key] = self.data[key].std()
            except:  # noqa: E722
                pass

Reproduction

Run the google colab reinforcement_learning_with_diffusers.ipynb

Logs

System Info

  • 🤗 Diffusers version: 0.31.0
  • Platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.17
  • Running on Google Colab?: No
  • Python version: 3.8.20
  • PyTorch version (GPU?): 2.4.1+cu121 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.7.2 (cpu)
  • Jax version: 0.4.13
  • JaxLib version: 0.4.13
  • Huggingface_hub version: 0.26.2
  • Transformers version: not installed
  • Accelerate version: 1.0.1
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 2080 Ti, 11264 MiB
    NVIDIA TITAN RTX, 24576 MiB
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @DN6

@rdesc rdesc added the bug Something isn't working label Jan 23, 2025
@rdesc
Copy link
Author

rdesc commented Jan 23, 2025

@ozgraslan

@ozgraslan
Copy link

There were 2 more issues we realized in the value_guided_sampling code:

  • How value function scaling is computed.
  • Value used for trajectory ordering.

How value function scaling is computed

In value_guided_sampling the posterior_variance is computed as (line 103):

                posterior_variance = self.scheduler._get_variance(i)
                model_std = torch.exp(0.5 * posterior_variance)
                grad = model_std * grad

But in original repository instead of std, variance is used. (See this commit)
Moreover, if the loaded pipeline uses different variance_types, the _get_variance function does not return variance.
For example, if variance_types="fixed_small_log", it returns posterior std. (This is the case with bglick13/hopper-medium-v2-value-function-hor32)

The change to fix this:

            if self.scheduler.variance_type == "fixed_small_log": # returns std
                posterior_std = self.scheduler._get_variance(i)
                posterior_log_std = torch.log(posterior_std)
                posterior_var = torch.exp(posterior_log_std * 2)
                # print("post var", posterior_var)
            elif self.scheduler.variance_type == "fixed_small": # returns var
                posterior_var = self.scheduler._get_variance(i)
            else:
                raise NotImplementedError

Value used for trajectory ordering

In the code, at each value guidance step, value of the current denoised trajectory is computed and used to guide sampling.
However, since we compute a the new sample after value guidance step, the returned value is not the value of the fully denoised trajectory.

To fix this instead of returning y in run_diffusion function, we can compute it after:

        # run the diffusion process
        x = self.run_diffusion(x, conditions, n_guide_steps, scale)

        # create batch of 0th timestep for value estimation
        timesteps = torch.full((batch_size,), 0, device=self.unet.device, dtype=torch.long)
        y = self.value_function(x.permute(0, 2, 1), timesteps).sample

Note: The first issue can be mitigated by adjusting the scale parameter, and the second issue is unlikely to affect performance, as the returned y is computed using the nearly fully denoised trajectory.

@rdesc
Copy link
Author

rdesc commented Jan 23, 2025

Tagging others who were involved in finding these bugs @FaisalAhmed0 @daniellawson9999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants