Add Hybrid Transformer block fusion by janEbert · Pull Request #4463 · NVIDIA/Megatron-LM

janEbert · 2026-04-24T19:44:37Z

Add fusion operator [...] to --hybrid-layer-pattern, enabling packing subsequent sequence mixer + channel mixer operations into one TransformerLayer. This way, we have better expectations of the operations and can leverage existing optimizations in TransformerLayer.

For checkpointing, we save the model as if it was unfused, so that we have a canonical format and backward-compatibility. The model can be loaded in fused form from the unfused checkpoint. The transformations to the state dict are only applied when saving or loading.

github-actions · 2026-04-24T19:44:45Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

copy-pr-bot · 2026-04-24T19:44:47Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Should yield the most forward- and backward-compatibility. Unfused checkpoints can be loaded into a fused model.

Do not modify `sharded_state_dict` method, instead use `generate_state_dict` to apply the transformation only when loading/saving.

Phlip79

Should we reject MTP usage within brackets?
Can you add a working forward/backward test? TestFusedLayerValidation just has failure path coverage.

- Similarly disallowed as the pipe symbol. - More explicitly mentioned and handled.

janEbert · 2026-04-28T16:07:09Z

Great points! Addressed both.

Add hybrid layer pattern fusion operator

0047404

janEbert requested review from a team as code owners April 24, 2026 19:44

svcnvidia-nemo-ci marked this pull request as draft April 24, 2026 19:44

svcnvidia-nemo-ci added this to the Core 0.16 milestone Apr 24, 2026

janEbert changed the title ~~Hybrid transformer fusion~~ Add Hybrid Transformer block fusion Apr 27, 2026

janEbert and others added 3 commits April 27, 2026 19:34

Implement Hybrid layer fusion

e777ce2

Use unfused form as canonical for checkpoints

b12801e

Should yield the most forward- and backward-compatibility. Unfused checkpoints can be loaded into a fused model.

Refactor state dict canonicalization

b7e2b05

Do not modify `sharded_state_dict` method, instead use `generate_state_dict` to apply the transformation only when loading/saving.

janEbert force-pushed the hybrid-transformer-fusion branch from 626f895 to b7e2b05 Compare April 27, 2026 17:34

Merge if-queries

c5ca092

janEbert force-pushed the hybrid-transformer-fusion branch from 62292ad to c5ca092 Compare April 27, 2026 17:37

janEbert marked this pull request as ready for review April 27, 2026 17:37

janEbert requested review from a team as code owners April 27, 2026 17:37

svcnvidia-nemo-ci requested a review from a team April 27, 2026 17:37

copy-pr-bot Bot temporarily deployed to test April 27, 2026 17:38 Inactive

svcnvidia-nemo-ci added the complexity: high label Apr 27, 2026

Phlip79 reviewed Apr 28, 2026

View reviewed changes

janEbert and others added 2 commits April 28, 2026 13:53

Consider MTP in fusion patterns

20344b8

- Similarly disallowed as the pipe symbol. - More explicitly mentioned and handled.

Add positive fusion test

ed1aeee

janEbert force-pushed the hybrid-transformer-fusion branch from 3fefa13 to ed1aeee Compare April 28, 2026 16:06

copy-pr-bot Bot temporarily deployed to test April 28, 2026 16:08 Inactive

Test MTP breakage

c46855b

copy-pr-bot Bot temporarily deployed to test April 28, 2026 18:40 Inactive

Fix test regex

88b5e4c

copy-pr-bot Bot temporarily deployed to test April 28, 2026 19:06 Inactive

Check brackets before splitting pattern

d164f08

copy-pr-bot Bot temporarily deployed to test April 28, 2026 22:22 Inactive

Phlip79 approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hybrid Transformer block fusion#4463

Add Hybrid Transformer block fusion#4463
janEbert wants to merge 10 commits intoNVIDIA:mainfrom
janEbert:hybrid-transformer-fusion

janEbert commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

Phlip79 left a comment

Uh oh!

janEbert commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

janEbert commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

Phlip79 left a comment

Choose a reason for hiding this comment

Uh oh!

janEbert commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janEbert commented Apr 24, 2026 •

edited

Loading