Bug inside value_guided_sampling.py

### Describe the bug

There's a bug here: https://github.com/huggingface/diffusers/blob/37c9697f5bb8c96b155d24d5e7382d5215677a8f/src/diffusers/experimental/rl/value_guided_sampling.py#L57-L67

The **means** and **stds** should be computed across each of the individual dimensions in the `observations`, `actions` space as its done in the original [jannerm/diffuser](https://github.com/jannerm) code. This is also made clear by the final video in the [reinforcement_learning_with_diffusers.ipynb](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb) colab notebook shared here for reference when comparing to a rollout video provided by jannerm/diffuser (second video).

https://github.com/user-attachments/assets/347e7239-624a-4855-8e4d-d939bfa72737

https://github.com/user-attachments/assets/1ca6bfdd-6fe2-42dc-ac21-43b3788bc9af

Proposed fix:
```python
        for key in self.data.keys():
            try:
                if key in ['observations', 'actions']:
                    self.means[key] = self.data[key].mean(axis=0)
                else:
                    self.means[key] = self.data[key].mean()
            except:  # noqa: E722
                pass
        self.stds = {}
        for key in self.data.keys():
            try:
                if key in ['observations', 'actions']:
                    self.stds[key] = self.data[key].std(axis=0)
                else:
                    self.stds[key] = self.data[key].std()
            except:  # noqa: E722
                pass
```

### Reproduction

Run the google colab [reinforcement_learning_with_diffusers.ipynb](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb)

### Logs

```shell

```

### System Info

- 🤗 Diffusers version: 0.31.0
- Platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.17
- Running on Google Colab?: No
- Python version: 3.8.20
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.2 (cpu)
- Jax version: 0.4.13
- JaxLib version: 0.4.13
- Huggingface_hub version: 0.26.2
- Transformers version: not installed
- Accelerate version: 1.0.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 2080 Ti, 11264 MiB
NVIDIA TITAN RTX, 24576 MiB
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

### Who can help?

@yiyixuxu @DN6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug inside value_guided_sampling.py #10636

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	for key in self.data.keys():
	try:
	self.means[key] = self.data[key].mean()
	except: # noqa: E722
	pass
	self.stds = {}
	for key in self.data.keys():
	try:
	self.stds[key] = self.data[key].std()
	except: # noqa: E722
	pass

Bug inside value_guided_sampling.py #10636

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions