Skip to content

Add mHC functional CI coverage (stacks on #4483)#4530

Closed
Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Connor-XY:yxu1/mhc-functional-ci-stacked-dsv4
Closed

Add mHC functional CI coverage (stacks on #4483)#4530
Connor-XY wants to merge 10 commits intoNVIDIA:dsv4from
Connor-XY:yxu1/mhc-functional-ci-stacked-dsv4

Conversation

@Connor-XY
Copy link
Copy Markdown

Summary

Adds an end-to-end mHC functional test recipe on H100 with TP=2 / PP=2.
Carved out of PR #4469 so the CI / recipe addition can be reviewed by @NVIDIA/ci separately from the source changes.

This PR's content (the only files this PR is asking to add):

  • tests/functional_tests/test_cases/gpt/gpt3_mcore_te_tp2_pp2_mhc/model_config.yaml (new): mHC test config (enable_hyper_connections=True, num_residual_streams=4).
  • tests/functional_tests/test_cases/gpt/gpt3_mcore_te_tp2_pp2_mhc/golden_values_dev_dgx_h100.json (new): golden values for parity comparison on dev_dgx_h100.
  • tests/test_utils/recipes/h100/gpt.yaml: wires the new test case into the H100 GPT recipe.

Stacking note

This branch is built on top of #4483 so the recipe's test infra resolves enable_hyper_connections from TransformerConfig. Realistically the recipe also exercises code from #4527 (fused kernels), #4528 (PP), and #4529 (HybridModel) once those merge — model_config.yaml doesn't import them but the integration test will only pass against the full mHC stack. Cannot merge until #4483 (and ideally the sibling PRs) lands.

"Files changed" shows #4483's content as ancestry — please review only the three files listed above.

Origin

Carved out of PR #4469 at commit e3d0102ad. The recipe and golden values were stable across the strict-review loop on the parent PR.

🤖 Generated by Claude Opus 4.7 (1M context).

Connor-XY and others added 10 commits April 27, 2026 09:15
Adds a functional test recipe exercising mHC end-to-end on H100 with
TP=2 / PP=2: golden-values comparison against a dev_dgx_h100 baseline,
model config, and a wired-up entry in `tests/test_utils/recipes/h100/gpt.yaml`.

Stacks on NVIDIA#4483 (transformer mHC reference impl). The recipe sets
`enable_hyper_connections=True` and exercises the n-stream P2P shape
path added in NVIDIA#4528 alongside the transformer / hybrid integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant