Skip to content

docs(pipeline_parallel): clarify seq_length behavior with variable_seq_lengths under PP#4471

Draft
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf:fix/2064-variable-seq-lengths-pp-docstring
Draft

docs(pipeline_parallel): clarify seq_length behavior with variable_seq_lengths under PP#4471
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
edenfunf:fix/2064-variable-seq-lengths-pp-docstring

Conversation

@edenfunf
Copy link
Copy Markdown

Summary

Fixes #2064.

The shared docstring of get_forward_backward_func said "This is ignored if variable_seq_lengths in the config is True" for the seq_length argument. That is true for two of the three schedules but not for the third:

Schedule Code path variable_seq_lengths=True behavior
forward_backward_no_pipelining (pp=1) signature marks seq_length: int, # unused (schedules.py:598) unused, regardless of variable_seq_lengths
forward_backward_pipelining_without_interleaving (pp>1, vp=None) calls get_tensor_shapes(seq_length, ...) which short-circuits to [()] in variable mode (schedules.py:2035-2038) ignored — shapes exchanged dynamically
forward_backward_pipelining_with_interleaving (pp>1, vp>1) unconditionally builds tensor_shape = [seq_length, micro_batch_size, hidden_size] (schedules.py:1084) still used to size P2P buffers — acts as the per-step max sequence length

I confirmed by grepping every reference to variable_seq_lengths between the start of the interleaved schedule and the start of get_tensor_shapes: there are zero hits in code (only the new docstring text). The interleaved schedule does not branch on variable_seq_lengths at all.

A user running PP>1 + virtual pipeline who reads the existing docstring and assumes "variable mode means I can stop passing a real seq_length" can hit either shape errors (if the value they pass is too small) or wasted P2P bandwidth/memory (if too large).

Changes

Pure documentation. No code changes; no test changes.

  • megatron/core/pipeline_parallel/schedules.py:
    • Replace the single "this is ignored if variable_seq_lengths" sentence in get_forward_backward_func with a per-schedule breakdown.
    • Add a one-paragraph note to forward_backward_pipelining_with_interleaving explicitly stating seq_length is always used to size the P2P buffer (and acts as the per-step max in variable mode).
    • Add a one-paragraph note to forward_backward_pipelining_without_interleaving stating seq_length is ignored in variable mode (shapes exchanged dynamically).

Test plan

  • uv run isort --check-only megatron/core/pipeline_parallel/schedules.py — clean.
  • Manually walked the three schedule functions and verified each docstring claim against the actual control flow (see table above).
  • Sphinx / autodoc rebuild — defer to CI to confirm RST renders.

The intent is to be a docs-only contribution that closes #2064 without altering any runtime behavior.

…q_lengths under PP (NVIDIA#2064)

The shared docstring of get_forward_backward_func said seq_length is ignored whenever
config.variable_seq_lengths=True. That holds for pp_size=1 and for the non-interleaved
schedule (which routes through get_tensor_shapes and short-circuits to [()] in variable
mode), but the interleaved schedule unconditionally builds the P2P activation buffer as
[seq_length, micro_batch_size, hidden_size] without consulting variable_seq_lengths.
Users running PP>1 with virtual pipeline can therefore hit shape errors or unnecessary
memory/bandwidth use if they assume seq_length is unused.

Spell out the per-schedule behavior in the central docstring and mirror the relevant
note onto each pipelined schedule's own docstring. Pure documentation; no code changes.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Docs-only] Clarify seq_length behavior when variable_seq_lengths=True under pipeline parallelism (PP>1)

2 participants