You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- [PPO and GRPO](https://yugeten.github.io/posts/2025/01/ppogrpo/)
46
46
- [GRPO](https://x.com/nrehiew_/status/1885079616248832090): "GRPO gets rid of the Value Model and NOT the Reward Model. This is the main insight since you save memory. The main change between PPO and GRPO is the way the advantage is calculated. PPO uses the Value Model to compute the advantage while GRPO computes the advantage by normalizing against the rollouts in each group."
0 commit comments