Skip to content

Add grouped W&B metric logging with per-correctness/category breakdowns#1

Open
matthewyryang wants to merge 2 commits intoLLM360:mainfrom
matthewyryang:metrics
Open

Add grouped W&B metric logging with per-correctness/category breakdowns#1
matthewyryang wants to merge 2 commits intoLLM360:mainfrom
matthewyryang:metrics

Conversation

@matthewyryang
Copy link
Copy Markdown

  • New top-level W&B sections: policy_shift/, train_inference_mismatch/, optimization/, reward/, response_stats/ — additive to existing train/ and rollout/ keys
  • Add log_probs, old_log_probs, train_rollout_logprob_diff (signed), ref_kl metrics to reported_loss in policy_loss_function
  • Emit grouped duplicates in log_train_step and log_rollout_data
  • rollout.py: per-correctness (correct/incorrect) and per-category splits of reward and response_stats metrics; all_correct/incorrect_group_frac for GRPO; auto-detects category from sample.metadata or args.log_problem_category

Matthew Yang and others added 2 commits April 10, 2026 01:30
- New top-level W&B sections: policy_shift/, train_inference_mismatch/,
  optimization/, reward/, response_stats/ — additive to existing train/ and rollout/ keys
- Add log_probs, old_log_probs, train_rollout_logprob_diff (signed), ref_kl
  metrics to reported_loss in policy_loss_function
- Emit grouped duplicates in log_train_step and log_rollout_data
- rollout.py: per-correctness (correct/incorrect) and per-category splits of
  reward and response_stats metrics; all_correct/incorrect_group_frac for GRPO;
  auto-detects category from sample.metadata or args.log_problem_category

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant