GRPO implementation #17

hank0316 · 2025-01-21T05:17:48Z

Hi Deepseek Team,

Thank you for your brilliant work! I’m currently trying to experiment with GRPO, but I couldn’t find its implementation. Do you have any plans to release the training code?

meigel · 2025-01-25T11:44:39Z

https://huggingface.co/docs/trl/main/en/grpo_trainer

github-actions · 2025-02-25T00:55:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!

syyuan2021 · 2025-03-12T18:38:23Z

Is the GRPO trainer on huggingface the same as in the DeepseekMath paper?

github-actions bot added the stale label Feb 25, 2025

github-actions bot removed the stale label Mar 19, 2025

hank0316 closed this as completed Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO implementation #17

GRPO implementation #17

hank0316 commented Jan 21, 2025

meigel commented Jan 25, 2025

github-actions bot commented Feb 25, 2025

syyuan2021 commented Mar 12, 2025

GRPO implementation #17

GRPO implementation #17

Comments

hank0316 commented Jan 21, 2025

meigel commented Jan 25, 2025

github-actions bot commented Feb 25, 2025

syyuan2021 commented Mar 12, 2025