Open
Description
Describe the bug
There's a bug here:
diffusers/src/diffusers/experimental/rl/value_guided_sampling.py
Lines 57 to 67 in 37c9697
The means and stds should be computed across each of the individual dimensions in the observations
, actions
space as its done in the original jannerm/diffuser code. This is also made clear by the final video in the reinforcement_learning_with_diffusers.ipynb colab notebook shared here for reference when comparing to a rollout video provided by jannerm/diffuser (second video).
buggy.mp4
jannerm_rollout.mp4
Proposed fix:
for key in self.data.keys():
try:
if key in ['observations', 'actions']:
self.means[key] = self.data[key].mean(axis=0)
else:
self.means[key] = self.data[key].mean()
except: # noqa: E722
pass
self.stds = {}
for key in self.data.keys():
try:
if key in ['observations', 'actions']:
self.stds[key] = self.data[key].std(axis=0)
else:
self.stds[key] = self.data[key].std()
except: # noqa: E722
pass
Reproduction
Run the google colab reinforcement_learning_with_diffusers.ipynb
Logs
System Info
- 🤗 Diffusers version: 0.31.0
- Platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.17
- Running on Google Colab?: No
- Python version: 3.8.20
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.2 (cpu)
- Jax version: 0.4.13
- JaxLib version: 0.4.13
- Huggingface_hub version: 0.26.2
- Transformers version: not installed
- Accelerate version: 1.0.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 2080 Ti, 11264 MiB
NVIDIA TITAN RTX, 24576 MiB - Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no