77 lines (55 loc) · 4.83 KB

Baselines

Environment: hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0

EasyR1 version: v0.3.2

Welcome to contribute new data points!

Algorithm Baselines

Qwen2.5-Instruct on Math12k

Size	Algorithm	Bits	LR	KL	Test Accuracy
7B	GRPO	AMP	1e-6	1e-2	0.75 -> 0.77 (+0.02)

Qwen2.5-VL-Instruct on Geometry3k

Size	Algorithm	Bits	LR	KL	Test Accuracy
7B	GRPO	AMP	1e-6	1e-2	0.37 -> 0.48 (+0.11)
7B	GRPO	BF16	1e-6	1e-2	0.37 -> 0.48 (+0.11)
7B	DAPO	AMP	1e-6	1e-2	0.37 -> 0.50 (+0.13)
7B	GSPO	AMP	1e-6	0	0.37 -> 0.48 (+0.11)
7B	CISPO	AMP	1e-6	1e-2	0.37 -> 0.50 (+0.13)
3B	GRPO	AMP	1e-6	1e-2	0.24 -> 0.38 (+0.14)
32B	GRPO	BF16	1e-6	1e-2	0.50 -> 0.56 (+0.06)

Qwen3-VL-Instruct on Geometry3k

Size	Algorithm	Bits	LR	KL	Test Accuracy
30B-A3B	GRPO	BF16	1e-6	1e-2	0.55 -> 0.78 (+0.23)

Qwen3-VL-Thinking on Geometry3k

Size	Algorithm	Bits	LR	KL	Test Accuracy
30B-A3B	GRPO	BF16	1e-6	1e-2	0.49 -> 0.77 (+0.28)

Note

The hyper-parameters not listed are all the same as the default values.

Performance Baselines

Qwen2.5-VL-Instruct on Geometry3k

Size	GPU Type	Bits	Batch Size	vLLM TP	Peak Mem	Peak VRAM	Throughput	Sec per step	Actor MFU
3B	8 * H100 80GB	AMP	1 / 2	2	120GB	54GB	1800 (+600)	120s	8.1%
7B	8 * H100 80GB	AMP	1 / 2	2	120GB	68GB	1600 (+400)	145s	16.0%
7B	8 * H100 80GB	AMP	4 / 8	2	200GB	72GB	2000 (+600)	120s	23.2%
7B	8 * L20 48GB	AMP	1 / 2	2	120GB	42GB	410 (+0)	580s	26.5%
7B	8 * H100 80GB	BF16	1 / 2	2	120GB	58GB	1600 (+320)	145s	16.0%
32B	8 * H100 80GB	BF16	1 / 2	8	260GB	72GB	620 (+260)	530s	25.8%

Qwen3-VL-Instruct on Geometry3k

Size	GPU Type	Bits	Batch Size	vLLM TP	Peak Mem	Peak VRAM	Throughput	Sec per step	Actor MFU
30B-A3B	8 * H800 80GB	BF16	1 / 2	8	170GB	50GB	80	4600s	1.8%

Qwen3-VL-Thinking on Geometry3k

Size	GPU Type	Bits	Batch Size	vLLM TP	Peak Mem	Peak VRAM	Throughput	Sec per step	Actor MFU
30B-A3B	8 * H800 80GB	BF16	1 / 2	8	210GB	50GB	65	8000s	1.4%

Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
vLLM TP: rollout.tensor_parallel_size
Peak Mem: Peak CPU memory usage
Peak VRAM: Peak GPU memory usage
Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the previous version)
Sec per step: Average time per step in seconds

Note

The hyper-parameters not listed are all the same as the default values.