Summary
The three core sampling-training loop arguments all use "rollout" but mean different things, making them hard to reason about (especially the constraint equation). Proposing clearer alternatives.
Current names vs. proposed
| Current |
Proposed |
Rationale |
--rollout-batch-size |
--prompts-per-cycle |
This is the number of prompts per sampling cycle, not a training batch size |
--num-steps-per-rollout |
--train-steps-per-cycle |
Number of optimizer steps run on one cycle's data (default 1 for on-policy) |
--num-rollout |
--num-cycles |
Total iterations of the sample-then-train loop |
--n-samples-per-prompt |
--responses-per-prompt |
"Samples" is overloaded in ML; these are generated completions |
--global-batch-size |
--train-batch-size |
Clarifies this is samples consumed per optimizer step |
Why this matters
The constraint equation becomes self-documenting:
# Before
rollout-batch-size × n-samples-per-prompt = global-batch-size × num-steps-per-rollout
# After
prompts-per-cycle × responses-per-prompt = train-batch-size × train-steps-per-cycle
The "after" version reads as plain English: "total responses generated per cycle = total responses consumed per cycle."
Suggestion for backwards compatibility
The old names could be kept as deprecated aliases so existing scripts don't break.
Summary
The three core sampling-training loop arguments all use "rollout" but mean different things, making them hard to reason about (especially the constraint equation). Proposing clearer alternatives.
Current names vs. proposed
--rollout-batch-size--prompts-per-cycle--num-steps-per-rollout--train-steps-per-cycle--num-rollout--num-cycles--n-samples-per-prompt--responses-per-prompt--global-batch-size--train-batch-sizeWhy this matters
The constraint equation becomes self-documenting:
The "after" version reads as plain English: "total responses generated per cycle = total responses consumed per cycle."
Suggestion for backwards compatibility
The old names could be kept as deprecated aliases so existing scripts don't break.