Skip to content

diegoasua/grpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRPO (Group Relative Policy Optimization)

A PyTorch implementation of Group Relative Policy Optimization for training language models with reward functions.

Overview

This repository implements GRPO, a policy optimization algorithm that uses group-based advantage estimation and relative rewards to train language models. The implementation includes:

  • GRPO algorithm implementation
  • Policy model wrapper for language models
  • Multiple reward functions
  • Training utilities

Installation

  1. Create a virtual environment:
python -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt

Components

GRPO Algorithm

The core GRPO implementation (grpo.py) provides:

  • Group-based advantage estimation
  • KL-divergence constrained policy updates
  • Clipped policy gradient optimization

Policy Model

The policy model (policy.py) wraps Hugging Face transformers models and provides:

  • XML-formatted response generation
  • Special token handling
  • Response formatting utilities

Training

Example usage for training: see example.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages