Skip to content

try fit in new transformers moe#110

Open
ETOgaosion wants to merge 16 commits intoISEEKYAN:mainfrom
ETOgaosion:fix/transformers_moe
Open

try fit in new transformers moe#110
ETOgaosion wants to merge 16 commits intoISEEKYAN:mainfrom
ETOgaosion:fix/transformers_moe

Conversation

@ETOgaosion
Copy link
Copy Markdown
Collaborator

@ETOgaosion ETOgaosion commented Mar 30, 2026

transformers>5.0 changes All MoE weights format, now there is a large expert gate_up tensor and down tensor:

https://github.com/huggingface/transformers/blob/aad13b87ed59f2afcfaebc985f403301887a35fc/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L820-L821

        self.gate_up_proj = nn.Parameter(torch.empty(self.num_experts, 2 * self.intermediate_dim, self.hidden_dim))
        self.down_proj = nn.Parameter(torch.empty(self.num_experts, self.hidden_dim, self.intermediate_dim))

@ETOgaosion
Copy link
Copy Markdown
Collaborator Author

ETOgaosion commented Mar 30, 2026

Now we support: loading legacy model save checkpoints with experts.x.gate_up_proj (according to actual key, also support new API)

And saving models with new state_dict() API with experts.gate_up_proj (according to transformers_version in transformer configs, also support legacy API)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant