feat(model): add bailing v2.6 model#46713
Conversation
- _init_weights: replace in-place module.slope.copy_(...) with the init.copy_ primitive to satisfy the modeling-structure linter (TRF012), matching the pattern used by Bamba for buffer re-init. Edited the modular source and regenerated the modeling file. - Add the contribution date stamp to the bailing_hybrid model card so the repository-consistency add_dates check passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46713&sha=6ffaff |
Update the auto_docstring checkpoint, from_pretrained examples, and model card to the 2.6-generation checkpoint inclusionAI/Ling-2.6-flash-base, and drop the inaccurate trillion-parameter description (the referenced model is the flash variant). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… HF checkpoint Align the in-library naming with InclusionAI's published modeling file (modeling_bailing_moe_v2_5.py): rename the module/dir to bailing2_5_moe, all classes from BailingHybrid* to BailingMoeV2_5* (e.g. BailingMoeV2_5ForCausalLM, BailingMoeV2_5Model, BailingMoeV2_5Config), and set config.model_type = "bailing2_5_moe", following the qwen3_moe layout. Updates the auto mappings, conversion mapping, toctree, model card, tests, and check_config_attributes accordingly; the modeling file is regenerated from the modular source. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @vasqu I think but let me know if you want me to assign someone else! |
|
Nope will take it; first review probably tomorrow or the day after 🤗 |
The dir rename makes docs/source/en/model_doc/bailing2_5_moe.md a new path on main, so the add_dates consistency check computes today's date; update the stamp to match and pass CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, bailing2_5_moe |
vasqu
left a comment
There was a problem hiding this comment.
Did some initial comments but there are many moving parts re linear attention refactors so asking to be a bit patient 🙏
| @@ -0,0 +1,72 @@ | |||
| <!--Copyright 2025 The HuggingFace Team. All rights reserved. | |||
There was a problem hiding this comment.
| <!--Copyright 2025 The HuggingFace Team. All rights reserved. | |
| <!--Copyright 2026 The HuggingFace Team. All rights reserved. |
for others as well please
| model = AutoModelForCausalLM.from_pretrained( | ||
| "inclusionAI/Ling-2.6-flash-base", | ||
| device_map="auto", | ||
| dtype=torch.bfloat16, |
There was a problem hiding this comment.
| dtype=torch.bfloat16, |
shouldnt be needed we use auto as default
There was a problem hiding this comment.
Just to be sure: The tokenizers backend is used for this model so we don't need an entry to tokenization auto?
|
|
||
| @auto_docstring(checkpoint="inclusionAI/Ling-2.6-flash-base") | ||
| @strict | ||
| class BailingMoeV2_5Config(PreTrainedConfig): |
There was a problem hiding this comment.
Imo we can move this to modular and inherit from somethin like deepseek v2/3?
| @strict | ||
| class BailingMoeV2_5Config(PreTrainedConfig): | ||
| r""" | ||
| layer_group_size (`int`, *optional*, defaults to 8): |
There was a problem hiding this comment.
this should be set via layer types instead, we could handle this in the post init
| self.rotary_emb = BailingMoeV2_5RotaryEmbedding(config=config) | ||
| self.rotary_emb_linear = BailingMoeV2_5LinearRotaryEmbedding(config=config) |
There was a problem hiding this comment.
Yea that's why we should do one class instead tbh per layer type
| past_key_values=past_key_values, | ||
| ) | ||
|
|
||
| def _update_linear_attn_mask(self, attention_mask, past_key_values): |
There was a problem hiding this comment.
masking will also change, sorry many moving things #46738
| if layer_group_size > 0: | ||
| full_attn_layers = [i for i in range(num_hidden_layers) if (i + 1) % layer_group_size == 0] | ||
| self_attn_renames = [ | ||
| WeightRenaming(rf"layers\.{i}\.attention\.", f"layers.{i}.self_attn.") for i in full_attn_layers |
There was a problem hiding this comment.
Hmm, imo I also don't mind to have the same naming internally. this is very awkward so would like to avoid this
| ) | ||
|
|
||
|
|
||
| class BailingMoeV2_5ModelTester: |
There was a problem hiding this comment.
Please use the causal lm tester, qwen next should be a good pointer
| "BambaConfig": ["attn_layer_indices"], | ||
| # layer_group_size builds `layer_types` in __post_init__ (and drives weight conversion); scoring_func/topk_method | ||
| # describe the router behavior the model hardcodes (sigmoid + noaux_tc), kept for checkpoint config compatibility. | ||
| "BailingMoeV2_5Config": ["layer_group_size", "scoring_func", "topk_method"], |
There was a problem hiding this comment.
Imo we can ignore these in transformers if not used at all - we still save those via kwargs but for a sole transformers model we likely dont need it then
What does this PR do?
This PR adds BailingHybridbailing v2.6 by InclusionAI), a hybrid linear-attention Mixture-of-Experts model. The
architecture combines:
layer_group_size(everylayer_group_size-th layer is full attention).flash-linear-attention, with a pure-PyTorch fallback whenflais not installed.noaux_tc) routing.The model is implemented with the modular mechanism (
modular_bailing_hybrid.py), inheriting fromdeepseek_v3,bamba,llama, andmixtral, somodeling_bailing_hybrid.pyis generated and stays in sync with those parents.It exposes
BailingHybridModel,BailingHybridForCausalLM,BailingHybridForSequenceClassification, andBailingHybridForTokenClassification, registered in the auto classes,plus a checkpoint conversion mapping in
conversion_mapping.pyfor loading the originalBailingMoeV2_5checkpoints.Fixes # (issue)
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request checks?
to it if that's the case.
Tests run