You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This year seems to be all about AI labs flexing their Agents trained with reinforcement learning. There’s been a lot of request in trland open r1, especially since smolagents Hugging Face’s Agents library. We'll need to make smolagents work well with RL and simplify the process of training these Agents.
To get started with training agents using GRPO (the RL method for Deepseek R1), we’ll first need to generate Agent responses in batches to maximize GPU utilization. Then, we can integrate that into TRLs GRPO.
follow up on our earlier discussion.
This year seems to be all about AI labs flexing their Agents trained with reinforcement learning. There’s been a lot of request in trl and open r1, especially since smolagents Hugging Face’s Agents library. We'll need to make smolagents work well with RL and simplify the process of training these Agents.
To get started with training agents using GRPO (the RL method for Deepseek R1), we’ll first need to generate Agent responses in batches to maximize GPU utilization. Then, we can integrate that into TRLs
GRPO
.We could maybe modify
Model
smolagents/src/smolagents/models.py
Line 240 in 93c433c
TransformersModel
smolagents/src/smolagents/models.py
Line 415 in 93c433c
to handle batch generation, along with
MultiStepAgent
smolagents/src/smolagents/agents.py
Line 125 in 93c433c
for processing those agent calls sequentially.
Or we could create separate classes for each.
And when the vllm backend is ready, we can do the same thing with vllm for even better efficiency.
The text was updated successfully, but these errors were encountered: