Skip to content

Fix spurious KL gradients for zero-std reward groups in GRPOTrainer#5640

Open
robrui wants to merge 2 commits intohuggingface:mainfrom
robrui:fix/grpo-zero-std-kl-masking
Open

Fix spurious KL gradients for zero-std reward groups in GRPOTrainer#5640
robrui wants to merge 2 commits intohuggingface:mainfrom
robrui:fix/grpo-zero-std-kl-masking

Commits

Commits on Apr 24, 2026