Skip to content

Add pipeline-parallel mHC compatibility (stacks on #4483)#4528

Closed
Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Connor-XY:yxu1/mhc-pp-stacked-dsv4
Closed

Add pipeline-parallel mHC compatibility (stacks on #4483)#4528
Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Connor-XY:yxu1/mhc-pp-stacked-dsv4

Conversation

@Connor-XY
Copy link
Copy Markdown

Summary

Wires the n-stream tensor shape through the non-interleaved pipeline
schedule's get_tensor_shapes and adds compatibility tests across all
PP rank positions and flexible-VPP layouts.

This PR's content (the only files this PR is asking to add or modify):

  • megatron/core/pipeline_parallel/schedules.py: optional pp_group / is_recv
    parameters on get_tensor_shapes; per-stage shape selection (first
    stage recv=C / send=nC, last stage recv=nC / send=C, intermediate
    send/recv=n*C); VPP+mHC backstop guard with aligned error message;
    removal of an unreachable n*C hidden_dim branch in the interleaved
    schedule.
  • tests/unit_tests/pipeline_parallel/test_pp_mhc_compatibility.py (new):
    shape correctness tests for all PP rank positions, dummy layer-count
    invariance under flexible-VPP layouts, and config-time guard tests.

Stacking note

This branch is built on top of #4483 (yxu1/mhc-transformer-core-code-dsv4) so that config.enable_hyper_connections and config.num_residual_streams (added in #4483) resolve. Cannot merge before #4483; the "Files changed" tab shows #4483's content as ancestry — please review only the two files listed above.

Origin

Carved out of PR #4469 at commit e3d0102ad. Includes the PP-specific strict-review fixes from passes 3–11 of /claude strict-review:

  • VPP+mHC config-time + schedule-time guards with aligned messages
  • dead-branch cleanup once the VPP guard is in place
  • is_first_stage / is_last_stage named locals for readable boundary semantics
  • inline comment explaining the asymmetric send/recv dimensions

Validation

python3 -m compileall on the touched files. GPU pytest of the recompute / shape suite happens via the strict-review CI.

🤖 Generated by Claude Opus 4.7 (1M context).

Connor-XY and others added 10 commits April 27, 2026 09:15
Wires the n-stream tensor shape through the non-interleaved pipeline
schedule's `get_tensor_shapes` and adds compatibility tests across
all PP rank positions and flexible-VPP layouts.

Implementation notes:
- `get_tensor_shapes`: optional `pp_group` / `is_recv` parameters select
  the per-stage send/recv dimension. First PP stage receives `C` (from
  embedding) and sends `n*C`; last stage receives `n*C` and sends `C`;
  intermediate stages send/recv `n*C`. Switch is named via
  `is_first_stage` / `is_last_stage` for clarity.
- VPP + mHC is blocked at config-validation time
  (`TransformerConfig.__post_init__` from NVIDIA#4483) and re-validated here
  as a backstop with an aligned error message.
- `forward_backward_pipelining_with_interleaving`: dead `n*C hidden_dim`
  branch removed (the VPP+mHC ValueError above blocks every code path
  that would enter it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant