Proposal: clarify naming of rollout/training loop CLI arguments

## Summary

The three core sampling-training loop arguments all use "rollout" but mean different things, making them hard to reason about (especially the constraint equation). Proposing clearer alternatives.

## Current names vs. proposed

| Current | Proposed | Rationale |
|---|---|---|
| `--rollout-batch-size` | `--prompts-per-cycle` | This is the number of prompts per sampling cycle, not a training batch size |
| `--num-steps-per-rollout` | `--train-steps-per-cycle` | Number of optimizer steps run on one cycle's data (default 1 for on-policy) |
| `--num-rollout` | `--num-cycles` | Total iterations of the sample-then-train loop |
| `--n-samples-per-prompt` | `--responses-per-prompt` | "Samples" is overloaded in ML; these are generated completions |
| `--global-batch-size` | `--train-batch-size` | Clarifies this is samples consumed per optimizer step |

## Why this matters

The constraint equation becomes self-documenting:

```
# Before
rollout-batch-size × n-samples-per-prompt = global-batch-size × num-steps-per-rollout

# After
prompts-per-cycle × responses-per-prompt = train-batch-size × train-steps-per-cycle
```

The "after" version reads as plain English: "total responses generated per cycle = total responses consumed per cycle."

## Suggestion for backwards compatibility

The old names could be kept as deprecated aliases so existing scripts don't break.

Current	Proposed	Rationale
`--rollout-batch-size`	`--prompts-per-cycle`	This is the number of prompts per sampling cycle, not a training batch size
`--num-steps-per-rollout`	`--train-steps-per-cycle`	Number of optimizer steps run on one cycle's data (default 1 for on-policy)
`--num-rollout`	`--num-cycles`	Total iterations of the sample-then-train loop
`--n-samples-per-prompt`	`--responses-per-prompt`	"Samples" is overloaded in ML; these are generated completions
`--global-batch-size`	`--train-batch-size`	Clarifies this is samples consumed per optimizer step

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: clarify naming of rollout/training loop CLI arguments #877

Summary

Current names vs. proposed

Why this matters

Suggestion for backwards compatibility

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: clarify naming of rollout/training loop CLI arguments #877

Description

Summary

Current names vs. proposed

Why this matters

Suggestion for backwards compatibility

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions