docs(pipeline_parallel): clarify seq_length behavior with variable_seq_lengths under PP#4471
Draft
edenfunf wants to merge 1 commit intoNVIDIA:mainfrom
Draft
Conversation
…q_lengths under PP (NVIDIA#2064) The shared docstring of get_forward_backward_func said seq_length is ignored whenever config.variable_seq_lengths=True. That holds for pp_size=1 and for the non-interleaved schedule (which routes through get_tensor_shapes and short-circuits to [()] in variable mode), but the interleaved schedule unconditionally builds the P2P activation buffer as [seq_length, micro_batch_size, hidden_size] without consulting variable_seq_lengths. Users running PP>1 with virtual pipeline can therefore hit shape errors or unnecessary memory/bandwidth use if they assume seq_length is unused. Spell out the per-schedule behavior in the central docstring and mirror the relevant note onto each pipelined schedule's own docstring. Pure documentation; no code changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2064.
The shared docstring of
get_forward_backward_funcsaid "This is ignored if variable_seq_lengths in the config is True" for theseq_lengthargument. That is true for two of the three schedules but not for the third:variable_seq_lengths=Truebehaviorforward_backward_no_pipelining(pp=1)seq_length: int, # unused(schedules.py:598)variable_seq_lengthsforward_backward_pipelining_without_interleaving(pp>1, vp=None)get_tensor_shapes(seq_length, ...)which short-circuits to[()]in variable mode (schedules.py:2035-2038)forward_backward_pipelining_with_interleaving(pp>1, vp>1)tensor_shape = [seq_length, micro_batch_size, hidden_size](schedules.py:1084)I confirmed by grepping every reference to
variable_seq_lengthsbetween the start of the interleaved schedule and the start ofget_tensor_shapes: there are zero hits in code (only the new docstring text). The interleaved schedule does not branch onvariable_seq_lengthsat all.A user running PP>1 + virtual pipeline who reads the existing docstring and assumes "variable mode means I can stop passing a real
seq_length" can hit either shape errors (if the value they pass is too small) or wasted P2P bandwidth/memory (if too large).Changes
Pure documentation. No code changes; no test changes.
megatron/core/pipeline_parallel/schedules.py:get_forward_backward_funcwith a per-schedule breakdown.forward_backward_pipelining_with_interleavingexplicitly statingseq_lengthis always used to size the P2P buffer (and acts as the per-step max in variable mode).forward_backward_pipelining_without_interleavingstatingseq_lengthis ignored in variable mode (shapes exchanged dynamically).Test plan
uv run isort --check-only megatron/core/pipeline_parallel/schedules.py— clean.The intent is to be a docs-only contribution that closes #2064 without altering any runtime behavior.