GRPO refactoring #2530

mydatascience · 2025-10-21T18:53:11Z

Description

Refactoring of grpo. Adding new unified functionality allowing to add models easily

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

.github/workflows/RunTests.yml

kyle-meggs · 2025-10-23T15:28:50Z

src/MaxText/examples/GRPO_README.md

@@ -0,0 +1,226 @@
+# GRPO Demo - Unified Training Interface
+
+This directory contains a unified interface for running GRPO (Group Relative Policy Optimization) training demos across different model sizes and configurations.


Should we be using the word "demo"?

Don't we anticipate users to use these scripts directly?

Yeah let's call it grpo_runner.py which calls in grpo_tunix_trainer.py

kyle-meggs · 2025-10-23T15:29:23Z

src/MaxText/examples/GRPO_README.md

+- `grpo_llama3_1_8b_demo_pw.py` - Pathways-based 8B model  
+- `grpo_llama3_1_70b_demo_pw.py` - Pathways-based 70B model
+
+These have been consolidated into a single **unified CLI script** (`grpo_demo.py`) that works with the new **grpo.yml** configuration file.


again - should be "demo"?

to me, demo indicates it may not be suitable for production workloads

Signed-off-by: Vladimir Suvorov <[email protected]>

A9isha · 2025-10-24T21:03:43Z

src/MaxText/examples/README.md

 - **`grpo_llama3_demo.ipynb`** → GRPO training on math dataset
+- **`grpo_demo.py`** → Unified CLI for GRPO training (any model)
+
+#### GRPO Usage


Let's call it

GRPO python script usage

A9isha · 2025-10-24T21:04:46Z

src/MaxText/examples/README.md


 ### GRPO Training

 - **`grpo_llama3_demo.ipynb`** → GRPO training on math dataset


Since you are using #### for the python script, maybe put #### GRPO colab usage here, and can we call it grpo_llama3_1_8b_demo.ipynb

Yes put there

A9isha · 2025-10-24T22:47:56Z

src/MaxText/experimental/rl/grpo_tunix_trainer.py

+    if num_vms >= 2:
+      # Multi-VM single host setup
+      num_devices = len(devices)
+      num_trainer_devices = int(num_devices * 0.5)  # 50% for training


This is not correct

for pathways we use the following and split out the mesh for trainer and inference if there are multiple hosts present

TRAINER_DEVICES_FRACTION = 0.5 SAMPLER_DEVICES_FRACTION = 0.5

if not using pathways, or if once one host

trainer_devices = devices sampler_devices = devices

Changed and made it as params to grpo.yml

Signed-off-by: Vladimir Suvorov <[email protected]>

mydatascience requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jacoguzo, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, suexu1025 and vipannalla as code owners October 21, 2025 18:53

github-advanced-security bot found potential problems Oct 21, 2025

View reviewed changes

.github/workflows/RunTests.yml Fixed Show fixed Hide fixed

kyle-meggs reviewed Oct 23, 2025

View reviewed changes

mydatascience added 4 commits October 24, 2025 19:29

grpo refactoring

417c1ed

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

2028cd7

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

00b040a

Signed-off-by: Vladimir Suvorov <[email protected]>

grpo refactor

e72906d

Signed-off-by: Vladimir Suvorov <[email protected]>

mydatascience force-pushed the universal_grpo branch from 149a3dc to e72906d Compare October 24, 2025 17:15

mydatascience requested a review from xuefgu as a code owner October 24, 2025 17:15

grpo refactor

466215e

Signed-off-by: Vladimir Suvorov <[email protected]>

A9isha reviewed Oct 24, 2025

View reviewed changes

mydatascience added 7 commits October 25, 2025 06:53

Fix

bcf1698

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix naming

27706eb

Signed-off-by: Vladimir Suvorov <[email protected]>

simplification of nb

bb8c9a6

Signed-off-by: Vladimir Suvorov <[email protected]>

simplification of nb

26982e1

Signed-off-by: Vladimir Suvorov <[email protected]>

fix

c7a482b

Signed-off-by: Vladimir Suvorov <[email protected]>

fix

9622d38

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

8453278

Signed-off-by: Vladimir Suvorov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO refactoring #2530

GRPO refactoring #2530

Uh oh!

mydatascience commented Oct 21, 2025

Uh oh!

Uh oh!

kyle-meggs Oct 23, 2025

Uh oh!

A9isha Oct 24, 2025

Uh oh!

mydatascience Oct 25, 2025

Uh oh!

kyle-meggs Oct 23, 2025

Uh oh!

A9isha Oct 24, 2025

Uh oh!

mydatascience Oct 25, 2025

Uh oh!

A9isha Oct 24, 2025

Uh oh!

mydatascience Oct 25, 2025

Uh oh!

A9isha Oct 24, 2025

Uh oh!

mydatascience Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,226 @@
		# GRPO Demo - Unified Training Interface

		This directory contains a unified interface for running GRPO (Group Relative Policy Optimization) training demos across different model sizes and configurations.


		### GRPO Training

		- `grpo_llama3_demo.ipynb` → GRPO training on math dataset

GRPO refactoring #2530

Are you sure you want to change the base?

GRPO refactoring #2530

Uh oh!

Conversation

mydatascience commented Oct 21, 2025

Description

Tests

Checklist

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants