Environment: hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
EasyR1 version: v0.3.2
Welcome to contribute new data points!
| Size | Algorithm | Bits | LR | KL | Test Accuracy |
|---|---|---|---|---|---|
| 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.75 -> 0.77 (+0.02) |
| Size | Algorithm | Bits | LR | KL | Test Accuracy |
|---|---|---|---|---|---|
| 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.37 -> 0.48 (+0.11) |
| 7B | GRPO | BF16 | 1e-6 | 1e-2 | 0.37 -> 0.48 (+0.11) |
| 7B | DAPO | AMP | 1e-6 | 1e-2 | 0.37 -> 0.50 (+0.13) |
| 7B | GSPO | AMP | 1e-6 | 0 | 0.37 -> 0.48 (+0.11) |
| 7B | CISPO | AMP | 1e-6 | 1e-2 | 0.37 -> 0.50 (+0.13) |
| 3B | GRPO | AMP | 1e-6 | 1e-2 | 0.24 -> 0.38 (+0.14) |
| 32B | GRPO | BF16 | 1e-6 | 1e-2 | 0.50 -> 0.56 (+0.06) |
| Size | Algorithm | Bits | LR | KL | Test Accuracy |
|---|---|---|---|---|---|
| 30B-A3B | GRPO | BF16 | 1e-6 | 1e-2 | 0.55 -> 0.78 (+0.23) |
| Size | Algorithm | Bits | LR | KL | Test Accuracy |
|---|---|---|---|---|---|
| 30B-A3B | GRPO | BF16 | 1e-6 | 1e-2 | 0.49 -> 0.77 (+0.28) |
Note
The hyper-parameters not listed are all the same as the default values.
| Size | GPU Type | Bits | Batch Size | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
|---|---|---|---|---|---|---|---|---|---|
| 3B | 8 * H100 80GB | AMP | 1 / 2 | 2 | 120GB | 54GB | 1800 (+600) | 120s | 8.1% |
| 7B | 8 * H100 80GB | AMP | 1 / 2 | 2 | 120GB | 68GB | 1600 (+400) | 145s | 16.0% |
| 7B | 8 * H100 80GB | AMP | 4 / 8 | 2 | 200GB | 72GB | 2000 (+600) | 120s | 23.2% |
| 7B | 8 * L20 48GB | AMP | 1 / 2 | 2 | 120GB | 42GB | 410 (+0) | 580s | 26.5% |
| 7B | 8 * H100 80GB | BF16 | 1 / 2 | 2 | 120GB | 58GB | 1600 (+320) | 145s | 16.0% |
| 32B | 8 * H100 80GB | BF16 | 1 / 2 | 8 | 260GB | 72GB | 620 (+260) | 530s | 25.8% |
| Size | GPU Type | Bits | Batch Size | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
|---|---|---|---|---|---|---|---|---|---|
| 30B-A3B | 8 * H800 80GB | BF16 | 1 / 2 | 8 | 170GB | 50GB | 80 | 4600s | 1.8% |
| Size | GPU Type | Bits | Batch Size | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
|---|---|---|---|---|---|---|---|---|---|
| 30B-A3B | 8 * H800 80GB | BF16 | 1 / 2 | 8 | 210GB | 50GB | 65 | 8000s | 1.4% |
- Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
- vLLM TP: rollout.tensor_parallel_size
- Peak Mem: Peak CPU memory usage
- Peak VRAM: Peak GPU memory usage
- Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the previous version)
- Sec per step: Average time per step in seconds
Note
The hyper-parameters not listed are all the same as the default values.