Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPO implementation #17

Closed
hank0316 opened this issue Jan 21, 2025 · 3 comments
Closed

GRPO implementation #17

hank0316 opened this issue Jan 21, 2025 · 3 comments

Comments

@hank0316
Copy link

Hi Deepseek Team,

Thank you for your brilliant work! I’m currently trying to experiment with GRPO, but I couldn’t find its implementation. Do you have any plans to release the training code?

@meigel
Copy link

meigel commented Jan 25, 2025

https://huggingface.co/docs/trl/main/en/grpo_trainer

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Feb 25, 2025
@syyuan2021
Copy link

Is the GRPO trainer on huggingface the same as in the DeepseekMath paper?

@github-actions github-actions bot removed the stale label Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants