docs(pipeline_parallel): clarify seq_length behavior with variable_seq_lengths under PP by edenfunf · Pull Request #4471 · NVIDIA/Megatron-LM

edenfunf · 2026-04-25T01:08:11Z

Summary

Fixes #2064.

The shared docstring of get_forward_backward_func said "This is ignored if variable_seq_lengths in the config is True" for the seq_length argument. That is true for two of the three schedules but not for the third:

Schedule	Code path	`variable_seq_lengths=True` behavior
`forward_backward_no_pipelining` (pp=1)	signature marks `seq_length: int, # unused` (schedules.py:598)	unused, regardless of `variable_seq_lengths`
`forward_backward_pipelining_without_interleaving` (pp>1, vp=None)	calls `get_tensor_shapes(seq_length, ...)` which short-circuits to `[()]` in variable mode (schedules.py:2035-2038)	ignored — shapes exchanged dynamically
`forward_backward_pipelining_with_interleaving` (pp>1, vp>1)	unconditionally builds `tensor_shape = [seq_length, micro_batch_size, hidden_size]` (schedules.py:1084)	still used to size P2P buffers — acts as the per-step max sequence length

I confirmed by grepping every reference to variable_seq_lengths between the start of the interleaved schedule and the start of get_tensor_shapes: there are zero hits in code (only the new docstring text). The interleaved schedule does not branch on variable_seq_lengths at all.

A user running PP>1 + virtual pipeline who reads the existing docstring and assumes "variable mode means I can stop passing a real seq_length" can hit either shape errors (if the value they pass is too small) or wasted P2P bandwidth/memory (if too large).

Changes

Pure documentation. No code changes; no test changes.

megatron/core/pipeline_parallel/schedules.py:
- Replace the single "this is ignored if variable_seq_lengths" sentence in get_forward_backward_func with a per-schedule breakdown.
- Add a one-paragraph note to forward_backward_pipelining_with_interleaving explicitly stating seq_length is always used to size the P2P buffer (and acts as the per-step max in variable mode).
- Add a one-paragraph note to forward_backward_pipelining_without_interleaving stating seq_length is ignored in variable mode (shapes exchanged dynamically).

Test plan

uv run isort --check-only megatron/core/pipeline_parallel/schedules.py — clean.
Manually walked the three schedule functions and verified each docstring claim against the actual control flow (see table above).
Sphinx / autodoc rebuild — defer to CI to confirm RST renders.

The intent is to be a docs-only contribution that closes #2064 without altering any runtime behavior.

…q_lengths under PP (NVIDIA#2064) The shared docstring of get_forward_backward_func said seq_length is ignored whenever config.variable_seq_lengths=True. That holds for pp_size=1 and for the non-interleaved schedule (which routes through get_tensor_shapes and short-circuits to [()] in variable mode), but the interleaved schedule unconditionally builds the P2P activation buffer as [seq_length, micro_batch_size, hidden_size] without consulting variable_seq_lengths. Users running PP>1 with virtual pipeline can therefore hit shape errors or unnecessary memory/bandwidth use if they assume seq_length is unused. Spell out the per-schedule behavior in the central docstring and mirror the relevant note onto each pipelined schedule's own docstring. Pure documentation; no code changes.

copy-pr-bot · 2026-04-25T01:08:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions Bot added the community-request label Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(pipeline_parallel): clarify seq_length behavior with variable_seq_lengths under PP#4471

docs(pipeline_parallel): clarify seq_length behavior with variable_seq_lengths under PP#4471
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf:fix/2064-variable-seq-lengths-pp-docstring

edenfunf commented Apr 25, 2026

Uh oh!

copy-pr-bot Bot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edenfunf commented Apr 25, 2026

Summary

Changes

Test plan

Uh oh!

copy-pr-bot Bot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants