Skip to content

Conversation

@mydatascience
Copy link
Collaborator

Description

Refactoring of grpo. Adding new unified functionality allowing to add models easily

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@@ -0,0 +1,226 @@
# GRPO Demo - Unified Training Interface

This directory contains a unified interface for running GRPO (Group Relative Policy Optimization) training demos across different model sizes and configurations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the word "demo"?

Don't we anticipate users to use these scripts directly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's call it grpo_runner.py which calls in grpo_tunix_trainer.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

- `grpo_llama3_1_8b_demo_pw.py` - Pathways-based 8B model
- `grpo_llama3_1_70b_demo_pw.py` - Pathways-based 70B model

These have been consolidated into a single **unified CLI script** (`grpo_demo.py`) that works with the new **grpo.yml** configuration file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again - should be "demo"?

to me, demo indicates it may not be suitable for production workloads

Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
- **`grpo_llama3_demo.ipynb`** → GRPO training on math dataset
- **`grpo_demo.py`** → Unified CLI for GRPO training (any model)

#### GRPO Usage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it

GRPO python script usage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### GRPO Training

- **`grpo_llama3_demo.ipynb`** → GRPO training on math dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using #### for the python script, maybe put #### GRPO colab usage here, and can we call it grpo_llama3_1_8b_demo.ipynb

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes put there

if num_vms >= 2:
# Multi-VM single host setup
num_devices = len(devices)
num_trainer_devices = int(num_devices * 0.5) # 50% for training
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct

for pathways we use the following and split out the mesh for trainer and inference if there are multiple hosts present

TRAINER_DEVICES_FRACTION = 0.5
SAMPLER_DEVICES_FRACTION = 0.5

if not using pathways, or if once one host

      trainer_devices = devices
      sampler_devices = devices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and made it as params to grpo.yml

Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants