Add LoRA support for AsyncGRPO#5610
Open
jonahsamost wants to merge 3 commits intohuggingface:mainfrom
Open
Conversation
added 2 commits
April 20, 2026 13:16
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e158804. Configure here.
Author
|
@qgallouedec I'm not sure if this is something you guys were interested in merging, let me know |
Member
|
Hey! thanks for the pr, yes it's definitely something we want. We will review at some point, please keep this opened |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What does this PR do?
AsyncGRPO seems to only support full fine-tuning and NCCL weight sync to vLLM. This PR adds LoRA support (it was tested with Gemma 4). HTTP reload was chosen over NCCL because LoRA parameter names don't match vLLM's internal names and fixing that would require vLLM-side changes. It also includes a fix to unfreeze LoRA parameters after model loading since
AutoModelForCausalLM.from_pretrainedfreezes them on load by default. I tested with Gemma4 and GSM8k.I added a few config fields (
use_lora, lora_adapter_path, lora_name) into theAsyncGRPOConfig.The Gemma4 schema is taken from transformers library
transformers/tests/utils/test_chat_parsing_utils.pyand the gemma4.jinja file comes fromtokenizer.save_pretrainedon that same Gemma4 model. But it might make sense to just remove them from the PR and make that a separate PR.Fixes # (issue)
Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Note
Medium Risk
Changes weight sync and parameter-freezing behavior in the async training pipeline and adds an HTTP-based vLLM adapter reload path, which could impact training stability and rollout correctness if misconfigured.
Overview
Adds LoRA mode to
AsyncGRPOTrainer, including new config flags (use_lora,lora_adapter_path,lora_name) and validation to ensure an adapter path is provided.When enabled, training now unfreezes only LoRA parameters and switches weight synchronization from NCCL streaming to a save-to-disk + HTTP hot-reload flow: the trainer saves the adapter with
save_pretrained(), pauses/resumes vLLM around the write, and instructs the rollout worker to reload via/v1/load_lora_adapterwhile generation requests target the configured LoRAmodelname.Separately, adds Gemma 4 chat support by introducing
trl/chat_templates/gemma4.jinjaplus agemma4_schemahook inadd_response_schema()for response/tool-call parsing.Reviewed by Cursor Bugbot for commit 9c5daf1. Bugbot is set up for automated code reviews on this repo. Configure here.