model: qwen3-omni (thinker-only) #10911

mickqian · 2025-09-25T11:54:18Z

Motivation

solve #11343

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-09-25T11:54:51Z

Summary of Changes

Hello @mickqian, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by integrating the advanced Qwen3-Omni multimodal model, focusing on its 'thinker-only' functionality. The changes encompass the addition of new configuration structures for various model sub-components, updates to core multimodal processing utilities, and adaptations to rotary embedding calculations to support the model's diverse input types. Furthermore, existing Qwen3-VL model code has been refactored for improved modularity, and new test cases have been introduced to ensure robust operation of the integrated models.

Highlights

Qwen3-Omni Model Integration: Added comprehensive support for the Qwen3OmniMoeForConditionalGeneration model, including its various sub-components like Audio Encoder, Vision Encoder, and Talker configurations.
New Configuration Definitions: Introduced a new configuration file qwen3_omni.py that defines detailed configurations for the Qwen3-Omni model's multimodal architecture.
Multimodal Rotary Embedding Enhancements: Updated the rotary embedding logic in rotary_embedding.py to specifically handle the complex multimodal position ID calculations required for the Qwen3-Omni model.
Improved Multimodal Utility Handling: Modified mm_utils.py to support deepstack embeddings on a per-modality basis and to correctly process lists of image and audio data.
Qwen3-VL Refactoring and Alignment: Refactored existing Qwen3-VL model components (qwen3_vl.py, qwen3_vl_moe.py) to be more modular, allowing for easier integration and consistent behavior with the new Omni model structure.
Expanded Test Coverage: Added new test classes (TestQwen3OmniMoeServer, TestQwen3VLMoeServer) and a mixed-modality test mixin (OmniOpenAITestMixin) to validate the functionality of the newly integrated and refactored models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the qwen3-omni model, a multimodal mixture-of-experts model. The changes include new configuration files, a detailed model implementation, and updates to the rotary embedding logic to handle image, video, and audio inputs. The PR also includes some beneficial refactoring of the existing qwen3-vl model code to improve extensibility and code reuse. My review focuses on cleaning up leftover development artifacts like debug prints and TODOs, pointing out a potential bug in the new rotary embedding logic, and suggesting improvements for documentation clarity and code style.

python/sglang/srt/layers/rotary_embedding.py

python/sglang/srt/configs/qwen3_omni.py

python/sglang/srt/managers/tokenizer_manager.py

python/sglang/srt/models/qwen3_omni_moe.py

Qing-zy · 2025-10-10T06:21:44Z

Will the talker be included in the future？

mickqian · 2025-10-10T08:03:21Z

Will the talker be included in the future？

Yes

Qing-zy · 2025-10-10T08:19:26Z

Will the talker be included in the future？

Yes

Could you share the planned timeline or support schedule for the talker?

JustinTong0323 · 2025-10-12T07:10:56Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for the qwen3-omni model, which is a significant undertaking involving new model configurations, model implementations, and updates to multimodal processing logic. The changes are extensive and well-structured, particularly the refactoring in qwen3_vl.py to accommodate the new omni-modal model. My review focuses on improving code clarity, removing dead code, and suggesting minor performance optimizations. Overall, this is a solid contribution.

python/sglang/srt/configs/qwen3_omni.py

gemini-code-assist · 2025-10-12T07:13:33Z

python/sglang/srt/configs/qwen3_omni.py

+        if code_predictor_config is None:
+            code_predictor_config = {}
+            self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig()
+            logger.info(
+                "code_predictor_config is None. Initializing code_predictor_config model with default values"
+            )
+        elif isinstance(code_predictor_config, Qwen3OmniMoeTalkerCodePredictorConfig):
+            self.code_predictor_config = code_predictor_config
+        else:
+            self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig(
+                **code_predictor_config
+            )
+
+        if text_config is None:
+            text_config = {}
+            self.text_config = Qwen3OmniMoeTalkerTextConfig()
+            logger.info(
+                "talker text_config is None. Initializing talker text model with default values"
+            )
+        elif isinstance(text_config, Qwen3OmniMoeTalkerTextConfig):
+            self.text_config = text_config
+        else:
+            self.text_config = Qwen3OmniMoeTalkerTextConfig(**text_config)


The initialization logic for code_predictor_config and text_config can be simplified. The lines ..._config = {} are redundant when the config is None, as self.<sub_config> is immediately initialized with a default config object. This makes the code slightly cleaner.

if code_predictor_config is None: self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig() logger.info( "code_predictor_config is None. Initializing code_predictor_config model with default values" ) elif isinstance(code_predictor_config, Qwen3OmniMoeTalkerCodePredictorConfig): self.code_predictor_config = code_predictor_config else: self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig( **code_predictor_config ) if text_config is None: self.text_config = Qwen3OmniMoeTalkerTextConfig() logger.info( "talker text_config is None. Initializing talker text model with default values" ) elif isinstance(text_config, Qwen3OmniMoeTalkerTextConfig): self.text_config = text_config else: self.text_config = Qwen3OmniMoeTalkerTextConfig(**text_config)

python/sglang/srt/configs/qwen3_omni.py

python/sglang/srt/configs/qwen3_vl.py

gemini-code-assist · 2025-10-12T07:13:34Z

python/sglang/srt/layers/rotary_embedding.py

+                    ed_vision_start = (
+                        input_tokens.index(vision_start_token_id, st)
+                        if (
+                            (
+                                image_token_id in input_tokens
+                                or video_token_id in input_tokens
+                            )
+                            and (remain_videos > 0 or remain_images > 0)
+                        )
+                        else len(input_tokens) + 1
+                    )
+                    ed_audio_start = (
+                        input_tokens.index(audio_start_token_id, st)
+                        if (audio_token_id in input_tokens and remain_audios > 0)
+                        else len(input_tokens) + 1
+                    )


Calling list.index() inside a loop can be inefficient, as it performs a linear scan from the start index st in each iteration. For better performance, you could find all occurrences of the special tokens (vision_start_token_id, audio_start_token_id) once before the loop and then iterate through those found indices.

python/sglang/srt/models/qwen3_omni_moe.py

python/sglang/srt/multimodal/processors/qwen_vl.py

python/sglang/bench_serving.py

JustinTong0323 · 2025-10-12T07:18:13Z

test/srt/test_vision_openai_server_a.py

These changes may conflict with #11062 . Shall we first merge that?

Yes, we should merge that one first

JustinTong0323 · 2025-10-12T07:20:51Z

python/sglang/srt/layers/rotary_embedding.py

@@ -1269,6 +1287,304 @@ def get_rope_index(
            mrope_position_deltas = max_position_ids + 1 - s
            return position_ids, mrope_position_deltas

+    @staticmethod
+    def get_rope_index_qwen3_omni(


This change could make the file extremely long.. Could we use an isolated mrope py file to put these helper functions? ref: https://docs.sglang.ai/developer_guide/contribution_guide.html#general-code-style

Yes, in another PR

JustinTong0323 · 2025-10-15T05:44:15Z

Test MMMU

(sglang) ➜  sglang git:(main) ✗ python benchmark/mmmu/bench_sglang.py --concurrency 512
Benchmark time: 125.9694957640022
answers saved to: ./answer_sglang.json
Evaluating...
answers saved to: ./answer_sglang.json
{'Accounting': {'acc': 0.3, 'num': 30},
 'Agriculture': {'acc': 0.6, 'num': 30},
 'Architecture_and_Engineering': {'acc': 0.267, 'num': 30},
 'Art': {'acc': 0.6, 'num': 30},
 'Art_Theory': {'acc': 0.833, 'num': 30},
 'Basic_Medical_Science': {'acc': 0.633, 'num': 30},
 'Biology': {'acc': 0.467, 'num': 30},
 'Chemistry': {'acc': 0.433, 'num': 30},
 'Clinical_Medicine': {'acc': 0.633, 'num': 30},
 'Computer_Science': {'acc': 0.567, 'num': 30},
 'Design': {'acc': 0.833, 'num': 30},
 'Diagnostics_and_Laboratory_Medicine': {'acc': 0.3, 'num': 30},
 'Economics': {'acc': 0.633, 'num': 30},
 'Electronics': {'acc': 0.367, 'num': 30},
 'Energy_and_Power': {'acc': 0.4, 'num': 30},
 'Finance': {'acc': 0.267, 'num': 30},
 'Geography': {'acc': 0.633, 'num': 30},
 'History': {'acc': 0.667, 'num': 30},
 'Literature': {'acc': 0.9, 'num': 30},
 'Manage': {'acc': 0.5, 'num': 30},
 'Marketing': {'acc': 0.433, 'num': 30},
 'Materials': {'acc': 0.433, 'num': 30},
 'Math': {'acc': 0.333, 'num': 30},
 'Mechanical_Engineering': {'acc': 0.433, 'num': 30},
 'Music': {'acc': 0.267, 'num': 30},
 'Overall': {'acc': 0.531, 'num': 900},
 'Overall-Art and Design': {'acc': 0.633, 'num': 120},
 'Overall-Business': {'acc': 0.427, 'num': 150},
 'Overall-Health and Medicine': {'acc': 0.573, 'num': 150},
 'Overall-Humanities and Social Science': {'acc': 0.725, 'num': 120},
 'Overall-Science': {'acc': 0.487, 'num': 150},
 'Overall-Tech and Engineering': {'acc': 0.438, 'num': 210},
 'Pharmacy': {'acc': 0.667, 'num': 30},
 'Physics': {'acc': 0.567, 'num': 30},
 'Psychology': {'acc': 0.633, 'num': 30},
 'Public_Health': {'acc': 0.633, 'num': 30},
 'Sociology': {'acc': 0.7, 'num': 30}}
eval out saved to ./val_sglang.json
Overall accuracy: 0.531

JustinTong0323 · 2025-10-15T05:58:09Z

Local e2e test passed.

zhyncs · 2025-10-15T07:41:24Z

@JustinTong0323 may u fix the ci failures, thanks!

JustinTong0323 · 2025-10-15T08:55:00Z

@JustinTong0323 may u fix the ci failures, thanks!

CI failures seems irrelevant, maybe breaks in main by #11561

zhyncs · 2025-10-15T19:32:05Z

[2025-10-15 06:58:11] Received sigquit from a child process. It usually means the child failed.
[2025-10-15 06:58:11 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/managers/scheduler.py", line 3040, in run_scheduler_process
    scheduler = Scheduler(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/managers/scheduler.py", line 418, in __init__
    self.tp_worker = TpModelWorker(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/managers/tp_worker.py", line 95, in __init__
    self.model_runner = ModelRunner(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 322, in __init__
    self.initialize(min_per_gpu_memory)
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 389, in initialize
    self.load_model()
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 852, in load_model
    self.model = get_model(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_loader/__init__.py", line 28, in get_model
    return loader.load_model(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_loader/loader.py", line 569, in load_model
    model = _initialize_model(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_loader/loader.py", line 247, in _initialize_model
    return model_class(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/models/qwen3_next.py", line 899, in __init__
    self.model = Qwen3NextModel(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/models/qwen3_next.py", line 833, in __init__
    self.layers = make_layers(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/utils/common.py", line 515, in make_layers
    + get_offloader().wrap_modules(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/utils/offloader.py", line 36, in wrap_modules
    return list(all_modules_generator)
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/utils/common.py", line 517, in <genexpr>
    layer_fn(idx=idx, prefix=add_prefix(idx, prefix))
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/models/qwen3_next.py", line 824, in get_layer
    layer_class = ALL_DECODER_LAYER_TYPES[config.layers_block_type[idx]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 207, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'Qwen3NextConfig' object has no attribute 'layers_block_type'

zhyncs · 2025-10-15T19:32:32Z

https://github.com/sgl-project/sglang/actions/runs/18518416644/job/52777254207?pr=10911

JustinTong0323 · 2025-10-16T02:27:13Z

MMMU Acc:

30b omni: 0.527
4b vl (dense): 0.472
30b vl (moe): 0.488

zhyncs · 2025-10-16T20:17:26Z

https://github.com/sgl-project/sglang/actions/runs/18553664322/job/52952792195?pr=10911

zhyncs · 2025-10-16T20:19:53Z

the AMD failure is not related to this pr

ref #11488

zhyncs · 2025-10-16T20:20:25Z

All NVIDIA PR Tests passed :)

Co-authored-by: Xinyuan Tong <[email protected]>

mickqian requested review from BBuf, Edwardf0t1, HaiShaw, JustinTong0323, Ying1123, ch-wan, hnyls2002, ispobock, kushanam, merrymercy, xiezhq-hermann and zhyncs as code owners September 25, 2025 11:54

sglang-bot added the run-ci label Sep 25, 2025

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

mickqian force-pushed the qwen3-omni branch from 4a73b22 to cb8b683 Compare September 26, 2025 02:36

mickqian mentioned this pull request Sep 26, 2025

model: qwen2.5 omni (thinker only) #4969

Closed

6 tasks

zhyncs added the high priority label Oct 8, 2025

zhyncs assigned mickqian and JustinTong0323 Oct 8, 2025

mickqian force-pushed the qwen3-omni branch from c52ede6 to 340926b Compare October 9, 2025 14:00

gemini-code-assist bot reviewed Oct 12, 2025

View reviewed changes

JustinTong0323 reviewed Oct 12, 2025

View reviewed changes

mickqian force-pushed the qwen3-omni branch from e150056 to 5c55b4f Compare October 12, 2025 08:00

mickqian added 2 commits October 14, 2025 10:13

loadable

dc96005

text model works

90ed9d7

mickqian force-pushed the qwen3-omni branch from 5c55b4f to d9cfd6b Compare October 14, 2025 02:16

zhyncs added 2 commits October 14, 2025 01:09

Merge branch 'main' into qwen3-omni

d26e9fb

upd

f119d0e

zhyncs requested a review from JustinTong0323 October 14, 2025 08:12

zhyncs self-assigned this Oct 14, 2025

JustinTong0323 and others added 2 commits October 14, 2025 21:56

Merge branch 'main' into qwen3-omni

72631a8

Merge branch 'main' into qwen3-omni

69bc279

zhyncs and others added 3 commits October 15, 2025 12:32

Merge branch 'main' into qwen3-omni

4b2af01

Merge branch 'main' into qwen3-omni

270e25b

revert hf_transformers_utils

c68f403

JustinTong0323 and others added 5 commits October 16, 2025 02:28

fix config

c61373a

Merge branch 'main' into qwen3-omni

78c633b

online quant the omni 30b to make it run in CI

bf05386

Merge branch 'main' into qwen3-omni

5b2a05c

workaround to fit 30B into H100 CI

d302652

zhyncs approved these changes Oct 16, 2025

View reviewed changes

zhyncs merged commit 86b04d2 into main Oct 16, 2025
196 of 222 checks passed

zhyncs deleted the qwen3-omni branch October 16, 2025 20:20

yuan-luo mentioned this pull request Oct 24, 2025

Revise POINTSV15Chat model #12049

Merged

4 tasks

leavelet pushed a commit to pkucnc/sglang that referenced this pull request Oct 27, 2025

model: qwen3-omni (thinker-only) (sgl-project#10911)

700a83d

Co-authored-by: Xinyuan Tong <[email protected]>

model: qwen3-omni (thinker-only) #10911

model: qwen3-omni (thinker-only) #10911

Uh oh!

Conversation

mickqian commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Sep 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Qing-zy commented Oct 10, 2025

Uh oh!

mickqian commented Oct 10, 2025

Uh oh!

Qing-zy commented Oct 10, 2025

Uh oh!

JustinTong0323 commented Oct 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JustinTong0323 Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 commented Oct 15, 2025

Uh oh!

JustinTong0323 commented Oct 15, 2025

Uh oh!

zhyncs commented Oct 15, 2025

Uh oh!

JustinTong0323 commented Oct 15, 2025

Uh oh!

zhyncs commented Oct 15, 2025

Uh oh!

zhyncs commented Oct 15, 2025

Uh oh!

JustinTong0323 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhyncs commented Oct 16, 2025

mickqian commented Sep 25, 2025 •

edited

Loading

JustinTong0323 commented Oct 16, 2025 •

edited

Loading