Using Same Base Model for GRPO Trainer and for Inference within reward function #2755

eligotts · 2025-02-03T17:17:26Z

eligotts
Feb 3, 2025

Hi there,

New to TRL so would really appreciate any help!

I was hoping to use the GRPO Trainer for an RL project, but I also want to use a neural based model as part of my reward function. Due to memory constraints, I am hoping to use the same base model, tacking on a previously finetuned "Reward LoRA" adapter within my reward function while also enabling peft within the GRPO Trainer to perform the RL training process using LoRa. So I would basically be using GRPO to train base model + "RL LoRA" (only learnable params) while utilizing base model + "Reward LoRA" (these have already been trained so are fixed, no learning here) in a reward function. Is this possible? It seems like PEFT does give some flexibility in terms of swapping adapters, but I am mainly worried about memory and thus getting access to the base model being finetuned within the GRPO Trainer in order to utilize it along with my "Reward Adapter" for inference within my reward function.

Would really appreciate any guidance! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Same Base Model for GRPO Trainer and for Inference within reward function #2755

{{title}}

Replies: 0 comments

Select a reply

Using Same Base Model for GRPO Trainer and for Inference within reward function #2755

eligotts Feb 3, 2025

Replies: 0 comments

eligotts
Feb 3, 2025