Skip to content

Conversation

yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Aug 22, 2025

No description provided.

Signed-off-by: yiliu30 <[email protected]>
Copy link

pytorch-bot bot commented Aug 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2846

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Cancelled Job

As of commit c6fdc2c with merge base df7bf37 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2025
@@ -30,7 +30,7 @@ def __init__(
def forward(self, x: Tensor) -> Tensor:
batch_size = x.shape[0]
x = x.view(-1, self.hidden_dim) # x: [T, D]
scores = self.router(x) # [T, E]
scores = self.router(x)[0] # [T, E]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

router here is a nn.Linear, not Llama4Router actually, see L21?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rounter was replaced with the original rounter.

router = module.router
up_proj = module.experts.gate_up_proj
w1, w3 = up_proj.permute(0, 2, 1).chunk(2, dim=1)
w2 = module.experts.down_proj.permute(0, 2, 1)
new_mod.router = router

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this module does not run by itself? seems quite confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MOEFeedForwardAOQuantizable was used, but it seems that its rounter shouldn’t be quantized in order to preserve accuracy. Could you confirm that? @HDCharles

@andrewor14 andrewor14 requested a review from liangel-02 August 25, 2025 19:01
@liangel-02 liangel-02 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Aug 27, 2025
@yiliu30
Copy link
Contributor Author

yiliu30 commented Sep 3, 2025

Hi @liangel-02 @andrewor14 Looks like the failed CI checks are not related to this PR. Could you help retrigger them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants