Skip to content

Proposal: clarify naming of rollout/training loop CLI arguments #877

@DavidBellamy

Description

@DavidBellamy

Summary

The three core sampling-training loop arguments all use "rollout" but mean different things, making them hard to reason about (especially the constraint equation). Proposing clearer alternatives.

Current names vs. proposed

Current Proposed Rationale
--rollout-batch-size --prompts-per-cycle This is the number of prompts per sampling cycle, not a training batch size
--num-steps-per-rollout --train-steps-per-cycle Number of optimizer steps run on one cycle's data (default 1 for on-policy)
--num-rollout --num-cycles Total iterations of the sample-then-train loop
--n-samples-per-prompt --responses-per-prompt "Samples" is overloaded in ML; these are generated completions
--global-batch-size --train-batch-size Clarifies this is samples consumed per optimizer step

Why this matters

The constraint equation becomes self-documenting:

# Before
rollout-batch-size × n-samples-per-prompt = global-batch-size × num-steps-per-rollout

# After
prompts-per-cycle × responses-per-prompt = train-batch-size × train-steps-per-cycle

The "after" version reads as plain English: "total responses generated per cycle = total responses consumed per cycle."

Suggestion for backwards compatibility

The old names could be kept as deprecated aliases so existing scripts don't break.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions