Fix spurious KL gradients for zero-std reward groups in GRPOTrainer#5640
Open
robrui wants to merge 2 commits intohuggingface:mainfrom
Open
Fix spurious KL gradients for zero-std reward groups in GRPOTrainer#5640robrui wants to merge 2 commits intohuggingface:mainfrom
robrui wants to merge 2 commits intohuggingface:mainfrom