feat(model): add bailing v2.6 model by kemuxiaozi000 · Pull Request #46713 · huggingface/transformers

kemuxiaozi000 · 2026-06-17T09:00:56Z

What does this PR do?

This PR adds BailingHybridbailing v2.6 by InclusionAI), a hybrid linear-attention Mixture-of-Experts model. The
architecture combines:

Hybrid attention — a 1:7 layer ratio of full Multi-head Latent Attention (MLA) to Lightning Linear Attention, giving near-linear complexity. The pattern is driven by
layer_group_size (every layer_group_size-th layer is full attention).
Multi-head Latent Attention (MLA) — DeepSeek-V3-style, with compressed KV cache via LoRA projections.
Lightning Linear Attention — based on SimpleGLA from flash-linear-attention, with a pure-PyTorch fallback when fla is not installed.
Mixture of Experts — 256 routed experts (8 active/token) with shared experts and group-limited greedy (noaux_tc) routing.

The model is implemented with the modular mechanism (modular_bailing_hybrid.py), inheriting from deepseek_v3, bamba, llama, and mixtral, so
modeling_bailing_hybrid.py is generated and stays in sync with those parents.

It exposes BailingHybridModel, BailingHybridForCausalLM, BailingHybridForSequenceClassification, and BailingHybridForTokenClassification, registered in the auto classes,
plus a checkpoint conversion mapping in conversion_mapping.py for loading the original BailingMoeV2_5 checkpoints.

Fixes # (issue)

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline and the
Pull Request checks?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes according to the guidelines?
Did you write any new necessary tests?

Tests run

# Model tests
pytest tests/models/bailing_hybrid/test_modeling_bailing_hybrid.py

# Repo consistency
python utils/check_modular_conversion.py --files src/transformers/models/bailing_hybrid/modular_bailing_hybrid.py
python utils/check_auto.py
python utils/check_config_attributes.py
make style

▎ Note: AI assistance was used to help port and regenerate the modular/auto code; all changed lines were reviewed by me, and I can defend the change end-to-end.

Who can review?

text models: @ArthurZucker @Cyrilvallez

A few things to flag before you submit, given the strict policy in `CONTRIBUTING.md`:

1. **`Fixes # (issue)`** — you need to fill in the real issue number, or remove the line. The policy requires the model addition to be coordinated via an issue first. If there's
no issue/approval yet, open one before the PR.
2. **The "discussed/approved" checkbox** — left unchecked since I don't have a coordination link; add it once you have one.
3. **"Not a pure code agent PR"** — I checked it on the assumption that *you* reviewed every line and will run the tests yourself. If you haven't yet, run the model tests in a
torch-equipped environment first (the local `.venv` has no PyTorch, so I couldn't execute them — only static checks passed).
4. The doc example in `bailing_hybrid.md` shows a sample generation output that should be verified against the real checkpoint before claiming it.

Want me to write this into a file (e.g. `PR_DESCRIPTION.md`), or adjust the title/reviewers?

- _init_weights: replace in-place module.slope.copy_(...) with the init.copy_ primitive to satisfy the modeling-structure linter (TRF012), matching the pattern used by Bamba for buffer re-init. Edited the modular source and regenerated the modeling file. - Add the contribution date stamp to the bailing_hybrid model card so the repository-consistency add_dates check passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-17T10:06:10Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46713&sha=6ffaff

Update the auto_docstring checkpoint, from_pretrained examples, and model card to the 2.6-generation checkpoint inclusionAI/Ling-2.6-flash-base, and drop the inaccurate trillion-parameter description (the referenced model is the flash variant). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… HF checkpoint Align the in-library naming with InclusionAI's published modeling file (modeling_bailing_moe_v2_5.py): rename the module/dir to bailing2_5_moe, all classes from BailingHybrid* to BailingMoeV2_5* (e.g. BailingMoeV2_5ForCausalLM, BailingMoeV2_5Model, BailingMoeV2_5Config), and set config.model_type = "bailing2_5_moe", following the qwen3_moe layout. Updates the auto mappings, conversion mapping, toctree, model card, tests, and check_config_attributes accordingly; the modeling file is regenerated from the modular source. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Rocketknight1 · 2026-06-22T13:24:06Z

cc @vasqu I think but let me know if you want me to assign someone else!

vasqu · 2026-06-22T17:16:54Z

Nope will take it; first review probably tomorrow or the day after 🤗

The dir rename makes docs/source/en/model_doc/bailing2_5_moe.md a new path on main, so the add_dates consistency check computes today's date; update the stamp to match and pass CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-23T03:20:03Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, bailing2_5_moe

vasqu

Did some initial comments but there are many moving parts re linear attention refactors so asking to be a bit patient 🙏

vasqu · 2026-06-24T14:15:13Z

@@ -0,0 +1,72 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.


Suggested change

<!--Copyright 2025 The HuggingFace Team. All rights reserved.

<!--Copyright 2026 The HuggingFace Team. All rights reserved.

for others as well please

vasqu · 2026-06-24T14:16:12Z

+model = AutoModelForCausalLM.from_pretrained(
+    "inclusionAI/Ling-2.6-flash-base",
+    device_map="auto",
+    dtype=torch.bfloat16,


Suggested change

dtype=torch.bfloat16,

shouldnt be needed we use auto as default

vasqu · 2026-06-24T14:18:50Z

Just to be sure: The tokenizers backend is used for this model so we don't need an entry to tokenization auto?

vasqu · 2026-06-24T14:21:02Z

+
+@auto_docstring(checkpoint="inclusionAI/Ling-2.6-flash-base")
+@strict
+class BailingMoeV2_5Config(PreTrainedConfig):


Imo we can move this to modular and inherit from somethin like deepseek v2/3?

vasqu · 2026-06-24T14:22:55Z

+@strict
+class BailingMoeV2_5Config(PreTrainedConfig):
+    r"""
+    layer_group_size (`int`, *optional*, defaults to 8):


this should be set via layer types instead, we could handle this in the post init

vasqu · 2026-06-24T15:15:14Z

+        self.rotary_emb = BailingMoeV2_5RotaryEmbedding(config=config)
+        self.rotary_emb_linear = BailingMoeV2_5LinearRotaryEmbedding(config=config)


Yea that's why we should do one class instead tbh per layer type

vasqu · 2026-06-24T15:16:27Z

+            past_key_values=past_key_values,
+        )
+
+    def _update_linear_attn_mask(self, attention_mask, past_key_values):


masking will also change, sorry many moving things #46738

vasqu · 2026-06-24T15:18:02Z

+        if layer_group_size > 0:
+            full_attn_layers = [i for i in range(num_hidden_layers) if (i + 1) % layer_group_size == 0]
+            self_attn_renames = [
+                WeightRenaming(rf"layers\.{i}\.attention\.", f"layers.{i}.self_attn.") for i in full_attn_layers


Hmm, imo I also don't mind to have the same naming internally. this is very awkward so would like to avoid this

vasqu · 2026-06-24T15:18:23Z

+    )
+
+
+class BailingMoeV2_5ModelTester:


Please use the causal lm tester, qwen next should be a good pointer

vasqu · 2026-06-24T15:19:09Z

    "BambaConfig": ["attn_layer_indices"],
+    # layer_group_size builds `layer_types` in __post_init__ (and drives weight conversion); scoring_func/topk_method
+    # describe the router behavior the model hardcodes (sigmoid + noaux_tc), kept for checkpoint config compatibility.
+    "BailingMoeV2_5Config": ["layer_group_size", "scoring_func", "topk_method"],


Imo we can ignore these in transformers if not used at all - we still save those via kwargs but for a sole transformers model we likely dont need it then

kemuxiaozi000 and others added 2 commits June 17, 2026 16:46

feat(model): add bailing v3 model

51057ee

kemuxiaozi000 changed the title ~~feat(model): add bailing v3 model~~ feat(model): add bailing v2.6 model Jun 22, 2026

vasqu reviewed Jun 24, 2026

View reviewed changes

		@@ -0,0 +1,72 @@
		<!--Copyright 2025 The HuggingFace Team. All rights reserved.

	<!--Copyright 2025 The HuggingFace Team. All rights reserved.
	<!--Copyright 2026 The HuggingFace Team. All rights reserved.

		self.rotary_emb = BailingMoeV2_5RotaryEmbedding(config=config)
		self.rotary_emb_linear = BailingMoeV2_5LinearRotaryEmbedding(config=config)

Uh oh!

Conversation

kemuxiaozi000 commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Before submitting

Tests run

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Rocketknight1 commented Jun 22, 2026

Uh oh!

vasqu commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kemuxiaozi000 commented Jun 17, 2026 •

edited

Loading