Add pipeline-parallel mHC compatibility (stacks on #4483)#4528
Closed
Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Closed
Add pipeline-parallel mHC compatibility (stacks on #4483)#4528Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Conversation
Wires the n-stream tensor shape through the non-interleaved pipeline schedule's `get_tensor_shapes` and adds compatibility tests across all PP rank positions and flexible-VPP layouts. Implementation notes: - `get_tensor_shapes`: optional `pp_group` / `is_recv` parameters select the per-stage send/recv dimension. First PP stage receives `C` (from embedding) and sends `n*C`; last stage receives `n*C` and sends `C`; intermediate stages send/recv `n*C`. Switch is named via `is_first_stage` / `is_last_stage` for clarity. - VPP + mHC is blocked at config-validation time (`TransformerConfig.__post_init__` from NVIDIA#4483) and re-validated here as a backstop with an aligned error message. - `forward_backward_pipelining_with_interleaving`: dead `n*C hidden_dim` branch removed (the VPP+mHC ValueError above blocks every code path that would enter it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires the n-stream tensor shape through the non-interleaved pipeline
schedule's
get_tensor_shapesand adds compatibility tests across allPP rank positions and flexible-VPP layouts.
This PR's content (the only files this PR is asking to add or modify):
megatron/core/pipeline_parallel/schedules.py: optionalpp_group/is_recvparameters on
get_tensor_shapes; per-stage shape selection (firststage recv=C / send=nC, last stage recv=nC / send=C, intermediate
send/recv=n*C); VPP+mHC backstop guard with aligned error message;
removal of an unreachable
n*C hidden_dimbranch in the interleavedschedule.
tests/unit_tests/pipeline_parallel/test_pp_mhc_compatibility.py(new):shape correctness tests for all PP rank positions, dummy layer-count
invariance under flexible-VPP layouts, and config-time guard tests.
Stacking note
This branch is built on top of #4483 (
yxu1/mhc-transformer-core-code-dsv4) so thatconfig.enable_hyper_connectionsandconfig.num_residual_streams(added in #4483) resolve. Cannot merge before #4483; the "Files changed" tab shows #4483's content as ancestry — please review only the two files listed above.Origin
Carved out of PR #4469 at commit
e3d0102ad. Includes the PP-specific strict-review fixes from passes 3–11 of/claude strict-review:is_first_stage/is_last_stagenamed locals for readable boundary semanticsValidation
python3 -m compileallon the touched files. GPU pytest of the recompute / shape suite happens via the strict-review CI.🤖 Generated by Claude Opus 4.7 (1M context).