⚡ vLLM for fast generation in GRPO #2600

qgallouedec · 2025-01-21T18:21:43Z

from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
import random

def random_reward(completions, **kwargs):
    return [random.random() for _ in completions]


def main():
    # Load the dataset
    dataset = load_dataset("trl-lib/ultrafeedback-prompt", split="train[:5%]")

    training_args = GRPOConfig(
        output_dir="Qwen2-0.5B-GRPO",
        logging_steps=2,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=16,
        max_prompt_length=64,
        max_completion_length=32,
        num_generations=4,
        num_train_epochs=1,
        use_vllm=True,
        vllm_device=2,
    )
    trainer = GRPOTrainer(
        model="Qwen/Qwen2-0.5B-Instruct",
        reward_funcs=random_reward,
        args=training_args,
        train_dataset=dataset,
    )
    trainer.train()

if __name__ == "__main__":
    main()

accelerate launch --num_processes 2 train_grpo.py

HuggingFaceDocBuilderDev · 2025-01-21T18:26:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

Looks great with just some minor nits and a question again about where we really need to load the model in float32 (double the VRAM otherwise)

trl/trainer/grpo_config.py

trl/trainer/grpo_trainer.py

Co-authored-by: lewtun <[email protected]>

DreamGenX · 2025-01-25T14:49:06Z

Could we consider a more flexible solution that is not tied to vLLM? For most models, there are faster inference engines, like SGLang.

And if you want to stick to one inference engine, SGLang already has API to update model weights: https://docs.sglang.ai/backend/native_api.html#Update-Weights-From-Disk

sfc-gh-zhyao · 2025-01-28T19:59:55Z

@qgallouedec does it work for multi-nodes (say 2nodes with 8GPUs)?

qgallouedec · 2025-01-28T20:04:44Z

@qgallouedec does it work for multi-nodes (say 2nodes with 8GPUs)?

No idea. Have you tried?

kashif · 2025-01-28T20:05:00Z

@sfc-gh-zhyao this will not work for the multi-node case...

docs/source/grpo_trainer.md

trl/trainer/grpo_config.py

trl/trainer/grpo_trainer.py

lewtun · 2025-01-29T10:53:06Z

@qgallouedec I am getting this error with ZeRO-3 on a node:

[rank2]: IndexError: pop from an empty deque
[rank0]: Traceback (most recent call last):
[rank0]:   File "/fsx/lewis/git/hf/trl/scratch/grpo_demo.py", line 55, in <module>
[rank0]:     main()
[rank0]:   File "/fsx/lewis/git/hf/trl/scratch/grpo_demo.py", line 52, in main
[rank0]:     trainer.train()
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/transformers/trainer.py", line 2171, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/transformers/trainer.py", line 3675, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/git/hf/trl/trl/trainer/grpo_trainer.py", line 442, in compute_loss
[rank0]:     per_token_logps = get_per_token_logps(model, prompt_completion_ids)
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/git/hf/trl/trl/trainer/grpo_trainer.py", line 431, in get_per_token_logps
[rank0]:     logits = model(input_ids).logits  # (B, L, V)
[rank0]:              ^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1914, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:            ^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank0]:     args_result = hook(self, args)
[rank0]:                   ^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 241, in _start_of_forward_hook
[rank0]:     self.get_param_coordinator().reset_step()
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 235, in reset_step
[rank0]:     self.construct_parameter_trace_from_module_trace()
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 219, in construct_parameter_trace_from_module_trace
[rank0]:     self.record_parameters(sub_module)
[rank0]:   File "/fsx/lewis/miniconda3/envs/trl/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 211, in record_parameters
[rank0]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: IndexError: pop from an empty deque

Gist to repro: https://gist.github.com/lewtun/d3c1ac9dbe96514b8fd6fafcc657f1bc

I'm also trying to isolate the cause (maybe gradient checkpointing?)

Update: error persists even when gradient checkpointing is disabled.

lewtun · 2025-01-29T11:00:04Z

Ah, maybe it's the deepspeed version. I am currently using deepspeed==0.16.3. Will roll back to 0.15.4 to check

Co-authored-by: lewtun <[email protected]>

docs/source/grpo_trainer.md

lewtun · 2025-01-29T11:35:25Z

OK regarding ZeRO-3, the following script from @qgallouedec works:

from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer

dataset = load_dataset("trl-lib/tldr", split="train")


# Dummy reward function: the closer the completion is to 20 characters, the higher the reward
def reward_len(completions, **kwargs):
    return [-abs(20 - len(completion)) for completion in completions]


training_args = GRPOConfig(
    output_dir="Qwen2.5-0.5B-GRPO",
    logging_steps=2,
    use_vllm=True,
    vllm_gpu_memory_utilization=0.7,
    max_prompt_length=128,
    bf16=True,
    gradient_accumulation_steps=4,
)
trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    reward_funcs=reward_len,
    args=training_args,
    train_dataset=dataset,
)
trainer.train()

Run with:

accelerate launch --config_file examples/accelerate_configs/multi_gpu.yaml --num_processes 7 grpo.py

Once the tests pass, let's go!

yiyepiaoling0715 · 2025-01-29T15:27:48Z

can this solve >14B model's problem of oom?

LukasNel · 2025-01-29T21:55:59Z

Still running into the mentioned issue!

qgallouedec · 2025-01-29T22:07:17Z

Still running into the mentioned issue!

Which one?

cjfcsjt · 2025-01-30T02:41:33Z

[rank0]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

running the given scripts results in the above error. How to fix this?
vllm 0.6.6.post1
deepspeed 0.15.4
torch 2.5.1+cu121
@qgallouedec @lewtun

qgallouedec · 2025-01-30T07:25:50Z

Please provide the full traceback

kashif · 2025-01-30T07:32:12Z

@cjfcsjt this error is due to older vllm, kindly upgrade vllm to 0.7.0 and try

cjfcsjt · 2025-01-30T08:47:44Z

@kashif Fixed. Thank you! May I ask how the updates in the new vllm version resolved this issue?

valayDave · 2025-01-31T01:09:12Z

One more quick question. It seems this patch is not compatible in a multi-node universe. Can this be ported to support multinode too ? (I tested it myself and --num_processes is only accounting for single node case)

NickyDark1 · 2025-01-31T13:31:10Z

training_args = GRPOConfig(
...
vllm_device="cuda:1",
...
)

I understand that this way I could redirect vllm to a GPU, but how could I redirect to more GPUs, for example "cuda:2", "cuda:4", "cuda:5" or other GPUs that I want?

thetushargoyal · 2025-01-31T22:28:41Z

hey @qgallouedec

accelerate launch --multi_gpu --num_processes 1 train_grpo.py

for use_vllm = True, this doesn't work as the --multi_gpu only works when --num_processes > 1

I was trying to run this on a 2xA100 setup

doc

933455b

qgallouedec added 3 commits January 21, 2025 18:30

fsdp

de40989

use vllm config

b906183

vllm

b4a7aa5

lewtun reviewed Jan 22, 2025

View reviewed changes

qgallouedec and others added 6 commits January 22, 2025 12:32

Update trl/trainer/grpo_config.py

8e9ceaf

Co-authored-by: lewtun <[email protected]>

Update trl/trainer/grpo_config.py

59eafd0

Co-authored-by: lewtun <[email protected]>

typo

c32e815

top_k, top_p

e49f38d

Link to vllm pr

d17039b

Merge branch 'main' into grpo_vllm

3a280d6

qgallouedec mentioned this pull request Jan 23, 2025

Is the newly implemented GRPO supposed to be slower than PPO? #2610

Closed

kashif added 5 commits January 24, 2025 21:11

Merge branch 'main' into grpo_vllm

9e5b8d0

fix missing device

1ba1ecf

fix tests

f8a33e3

fix citation

693bb4e

fix title and paper_id

b0b203c

lewtun mentioned this pull request Jan 25, 2025

Evaluate GRPO vs. other RL algorithms huggingface/open-r1#11

Open

2 tasks

kashif added 10 commits January 25, 2025 10:37

Merge branch 'main' into grpo_vllm

383b795

Merge branch 'main' into grpo_vllm

4abe3ea

formatting

2d956c7

output the correct number of generations

b151cc1

initial async vllm

136dd89

fix missing args

ca4b818

fix promps

edbf2ed

Pass prompt_token_ids directly

3b7fd21

Repeat each prompt num_generations times

eff4263

get the slice of results per processor

a7483bc

qgallouedec mentioned this pull request Jan 29, 2025

🚧 Add Optional ZeRO-3 Weight Gathering for GRPO in Sequence Generation #2667

Merged

5 tasks

edbeeching approved these changes Jan 29, 2025

View reviewed changes

qgallouedec added 2 commits January 29, 2025 10:21

Update grpo_config.py

2e3ae36

Update deepspeed_zero1.yaml

8d283d0

kashif approved these changes Jan 29, 2025

View reviewed changes

lewtun reviewed Jan 29, 2025

View reviewed changes

docs/source/grpo_trainer.md Outdated Show resolved Hide resolved

docs/source/grpo_trainer.md Outdated Show resolved Hide resolved

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 2 commits January 29, 2025 12:25

Update trl/trainer/grpo_trainer.py

ccc4d43

Co-authored-by: lewtun <[email protected]>

Apply suggestions from code review

3ff5de4

Co-authored-by: lewtun <[email protected]>

qgallouedec commented Jan 29, 2025

View reviewed changes

docs/source/grpo_trainer.md Outdated Show resolved Hide resolved

Update docs/source/grpo_trainer.md

be10004

lewtun approved these changes Jan 29, 2025

View reviewed changes

qgallouedec merged commit ed14ed9 into main Jan 29, 2025
14 checks passed

qgallouedec deleted the grpo_vllm branch January 29, 2025 12:01

miaoz0 mentioned this pull request Feb 10, 2025

IndexError: pop from an empty deque while using PPO and downgrading accelerate to 0.34.2 #2795

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ vLLM for fast generation in GRPO #2600

⚡ vLLM for fast generation in GRPO #2600

qgallouedec commented Jan 21, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 21, 2025

lewtun left a comment

DreamGenX commented Jan 25, 2025

sfc-gh-zhyao commented Jan 28, 2025

qgallouedec commented Jan 28, 2025

kashif commented Jan 28, 2025

lewtun commented Jan 29, 2025 •

edited

Loading

lewtun commented Jan 29, 2025

lewtun commented Jan 29, 2025

yiyepiaoling0715 commented Jan 29, 2025

LukasNel commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

cjfcsjt commented Jan 30, 2025

qgallouedec commented Jan 30, 2025

kashif commented Jan 30, 2025

cjfcsjt commented Jan 30, 2025

valayDave commented Jan 31, 2025

NickyDark1 commented Jan 31, 2025

thetushargoyal commented Jan 31, 2025

⚡ vLLM for fast generation in GRPO #2600

⚡ vLLM for fast generation in GRPO #2600

Conversation

qgallouedec commented Jan 21, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Jan 21, 2025

lewtun left a comment

Choose a reason for hiding this comment

DreamGenX commented Jan 25, 2025

sfc-gh-zhyao commented Jan 28, 2025

qgallouedec commented Jan 28, 2025

kashif commented Jan 28, 2025

lewtun commented Jan 29, 2025 • edited Loading

lewtun commented Jan 29, 2025

lewtun commented Jan 29, 2025

yiyepiaoling0715 commented Jan 29, 2025

LukasNel commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

cjfcsjt commented Jan 30, 2025

qgallouedec commented Jan 30, 2025

kashif commented Jan 30, 2025

cjfcsjt commented Jan 30, 2025

valayDave commented Jan 31, 2025

NickyDark1 commented Jan 31, 2025

thetushargoyal commented Jan 31, 2025

qgallouedec commented Jan 21, 2025 •

edited

Loading

lewtun commented Jan 29, 2025 •

edited

Loading