Skip to content

Add LoRA support for AsyncGRPO#5610

Open
jonahsamost wants to merge 3 commits intohuggingface:mainfrom
jonahsamost:jonah_lora_4_20
Open

Add LoRA support for AsyncGRPO#5610
jonahsamost wants to merge 3 commits intohuggingface:mainfrom
jonahsamost:jonah_lora_4_20

Conversation

@jonahsamost
Copy link
Copy Markdown

@jonahsamost jonahsamost commented Apr 21, 2026

What does this PR do?

AsyncGRPO seems to only support full fine-tuning and NCCL weight sync to vLLM. This PR adds LoRA support (it was tested with Gemma 4). HTTP reload was chosen over NCCL because LoRA parameter names don't match vLLM's internal names and fixing that would require vLLM-side changes. It also includes a fix to unfreeze LoRA parameters after model loading since AutoModelForCausalLM.from_pretrained freezes them on load by default. I tested with Gemma4 and GSM8k.

I added a few config fields (use_lora, lora_adapter_path, lora_name) into the AsyncGRPOConfig.

The Gemma4 schema is taken from transformers library transformers/tests/utils/test_chat_parsing_utils.py and the gemma4.jinja file comes from tokenizer.save_pretrained on that same Gemma4 model. But it might make sense to just remove them from the PR and make that a separate PR.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.


Note

Medium Risk
Changes weight sync and parameter-freezing behavior in the async training pipeline and adds an HTTP-based vLLM adapter reload path, which could impact training stability and rollout correctness if misconfigured.

Overview
Adds LoRA mode to AsyncGRPOTrainer, including new config flags (use_lora, lora_adapter_path, lora_name) and validation to ensure an adapter path is provided.

When enabled, training now unfreezes only LoRA parameters and switches weight synchronization from NCCL streaming to a save-to-disk + HTTP hot-reload flow: the trainer saves the adapter with save_pretrained(), pauses/resumes vLLM around the write, and instructs the rollout worker to reload via /v1/load_lora_adapter while generation requests target the configured LoRA model name.

Separately, adds Gemma 4 chat support by introducing trl/chat_templates/gemma4.jinja plus a gemma4_schema hook in add_response_schema() for response/tool-call parsing.

Reviewed by Cursor Bugbot for commit 9c5daf1. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e158804. Configure here.

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py
Comment thread trl/experimental/async_grpo/async_grpo_trainer.py
@jonahsamost
Copy link
Copy Markdown
Author

@qgallouedec I'm not sure if this is something you guys were interested in merging, let me know

@qgallouedec
Copy link
Copy Markdown
Member

qgallouedec commented Apr 24, 2026

Hey! thanks for the pr, yes it's definitely something we want. We will review at some point, please keep this opened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants