Skip to content

Conversation

@mickqian
Copy link
Collaborator

@mickqian mickqian commented Sep 25, 2025

Motivation

solve #11343

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mickqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by integrating the advanced Qwen3-Omni multimodal model, focusing on its 'thinker-only' functionality. The changes encompass the addition of new configuration structures for various model sub-components, updates to core multimodal processing utilities, and adaptations to rotary embedding calculations to support the model's diverse input types. Furthermore, existing Qwen3-VL model code has been refactored for improved modularity, and new test cases have been introduced to ensure robust operation of the integrated models.

Highlights

  • Qwen3-Omni Model Integration: Added comprehensive support for the Qwen3OmniMoeForConditionalGeneration model, including its various sub-components like Audio Encoder, Vision Encoder, and Talker configurations.
  • New Configuration Definitions: Introduced a new configuration file qwen3_omni.py that defines detailed configurations for the Qwen3-Omni model's multimodal architecture.
  • Multimodal Rotary Embedding Enhancements: Updated the rotary embedding logic in rotary_embedding.py to specifically handle the complex multimodal position ID calculations required for the Qwen3-Omni model.
  • Improved Multimodal Utility Handling: Modified mm_utils.py to support deepstack embeddings on a per-modality basis and to correctly process lists of image and audio data.
  • Qwen3-VL Refactoring and Alignment: Refactored existing Qwen3-VL model components (qwen3_vl.py, qwen3_vl_moe.py) to be more modular, allowing for easier integration and consistent behavior with the new Omni model structure.
  • Expanded Test Coverage: Added new test classes (TestQwen3OmniMoeServer, TestQwen3VLMoeServer) and a mixed-modality test mixin (OmniOpenAITestMixin) to validate the functionality of the newly integrated and refactored models.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the qwen3-omni model, a multimodal mixture-of-experts model. The changes include new configuration files, a detailed model implementation, and updates to the rotary embedding logic to handle image, video, and audio inputs. The PR also includes some beneficial refactoring of the existing qwen3-vl model code to improve extensibility and code reuse. My review focuses on cleaning up leftover development artifacts like debug prints and TODOs, pointing out a potential bug in the new rotary embedding logic, and suggesting improvements for documentation clarity and code style.

@Qing-zy
Copy link

Qing-zy commented Oct 10, 2025

Will the talker be included in the future?

@mickqian
Copy link
Collaborator Author

Will the talker be included in the future?

Yes

@Qing-zy
Copy link

Qing-zy commented Oct 10, 2025

Will the talker be included in the future?

Yes

Could you share the planned timeline or support schedule for the talker?

@JustinTong0323
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the qwen3-omni model, which is a significant undertaking involving new model configurations, model implementations, and updates to multimodal processing logic. The changes are extensive and well-structured, particularly the refactoring in qwen3_vl.py to accommodate the new omni-modal model. My review focuses on improving code clarity, removing dead code, and suggesting minor performance optimizations. Overall, this is a solid contribution.

Comment on lines +451 to +473
if code_predictor_config is None:
code_predictor_config = {}
self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig()
logger.info(
"code_predictor_config is None. Initializing code_predictor_config model with default values"
)
elif isinstance(code_predictor_config, Qwen3OmniMoeTalkerCodePredictorConfig):
self.code_predictor_config = code_predictor_config
else:
self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig(
**code_predictor_config
)

if text_config is None:
text_config = {}
self.text_config = Qwen3OmniMoeTalkerTextConfig()
logger.info(
"talker text_config is None. Initializing talker text model with default values"
)
elif isinstance(text_config, Qwen3OmniMoeTalkerTextConfig):
self.text_config = text_config
else:
self.text_config = Qwen3OmniMoeTalkerTextConfig(**text_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The initialization logic for code_predictor_config and text_config can be simplified. The lines ..._config = {} are redundant when the config is None, as self.<sub_config> is immediately initialized with a default config object. This makes the code slightly cleaner.

        if code_predictor_config is None:
            self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig()
            logger.info(
                "code_predictor_config is None. Initializing code_predictor_config model with default values"
            )
        elif isinstance(code_predictor_config, Qwen3OmniMoeTalkerCodePredictorConfig):
            self.code_predictor_config = code_predictor_config
        else:
            self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig(
                **code_predictor_config
            )

        if text_config is None:
            self.text_config = Qwen3OmniMoeTalkerTextConfig()
            logger.info(
                "talker text_config is None. Initializing talker text model with default values"
            )
        elif isinstance(text_config, Qwen3OmniMoeTalkerTextConfig):
            self.text_config = text_config
        else:
            self.text_config = Qwen3OmniMoeTalkerTextConfig(**text_config)

Comment on lines +1357 to +1396
ed_vision_start = (
input_tokens.index(vision_start_token_id, st)
if (
(
image_token_id in input_tokens
or video_token_id in input_tokens
)
and (remain_videos > 0 or remain_images > 0)
)
else len(input_tokens) + 1
)
ed_audio_start = (
input_tokens.index(audio_start_token_id, st)
if (audio_token_id in input_tokens and remain_audios > 0)
else len(input_tokens) + 1
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling list.index() inside a loop can be inefficient, as it performs a linear scan from the start index st in each iteration. For better performance, you could find all occurrences of the special tokens (vision_start_token_id, audio_start_token_id) once before the loop and then iterate through those found indices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes may conflict with #11062 . Shall we first merge that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should merge that one first

@@ -1269,6 +1287,304 @@ def get_rope_index(
mrope_position_deltas = max_position_ids + 1 - s
return position_ids, mrope_position_deltas

@staticmethod
def get_rope_index_qwen3_omni(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change could make the file extremely long.. Could we use an isolated mrope py file to put these helper functions? ref: https://docs.sglang.ai/developer_guide/contribution_guide.html#general-code-style

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in another PR

@zhyncs zhyncs requested a review from JustinTong0323 October 14, 2025 08:12
@zhyncs zhyncs self-assigned this Oct 14, 2025
@JustinTong0323
Copy link
Collaborator

Test MMMU

(sglang) ➜  sglang git:(main) ✗ python benchmark/mmmu/bench_sglang.py --concurrency 512
Benchmark time: 125.9694957640022
answers saved to: ./answer_sglang.json
Evaluating...
answers saved to: ./answer_sglang.json
{'Accounting': {'acc': 0.3, 'num': 30},
 'Agriculture': {'acc': 0.6, 'num': 30},
 'Architecture_and_Engineering': {'acc': 0.267, 'num': 30},
 'Art': {'acc': 0.6, 'num': 30},
 'Art_Theory': {'acc': 0.833, 'num': 30},
 'Basic_Medical_Science': {'acc': 0.633, 'num': 30},
 'Biology': {'acc': 0.467, 'num': 30},
 'Chemistry': {'acc': 0.433, 'num': 30},
 'Clinical_Medicine': {'acc': 0.633, 'num': 30},
 'Computer_Science': {'acc': 0.567, 'num': 30},
 'Design': {'acc': 0.833, 'num': 30},
 'Diagnostics_and_Laboratory_Medicine': {'acc': 0.3, 'num': 30},
 'Economics': {'acc': 0.633, 'num': 30},
 'Electronics': {'acc': 0.367, 'num': 30},
 'Energy_and_Power': {'acc': 0.4, 'num': 30},
 'Finance': {'acc': 0.267, 'num': 30},
 'Geography': {'acc': 0.633, 'num': 30},
 'History': {'acc': 0.667, 'num': 30},
 'Literature': {'acc': 0.9, 'num': 30},
 'Manage': {'acc': 0.5, 'num': 30},
 'Marketing': {'acc': 0.433, 'num': 30},
 'Materials': {'acc': 0.433, 'num': 30},
 'Math': {'acc': 0.333, 'num': 30},
 'Mechanical_Engineering': {'acc': 0.433, 'num': 30},
 'Music': {'acc': 0.267, 'num': 30},
 'Overall': {'acc': 0.531, 'num': 900},
 'Overall-Art and Design': {'acc': 0.633, 'num': 120},
 'Overall-Business': {'acc': 0.427, 'num': 150},
 'Overall-Health and Medicine': {'acc': 0.573, 'num': 150},
 'Overall-Humanities and Social Science': {'acc': 0.725, 'num': 120},
 'Overall-Science': {'acc': 0.487, 'num': 150},
 'Overall-Tech and Engineering': {'acc': 0.438, 'num': 210},
 'Pharmacy': {'acc': 0.667, 'num': 30},
 'Physics': {'acc': 0.567, 'num': 30},
 'Psychology': {'acc': 0.633, 'num': 30},
 'Public_Health': {'acc': 0.633, 'num': 30},
 'Sociology': {'acc': 0.7, 'num': 30}}
eval out saved to ./val_sglang.json
Overall accuracy: 0.531

@JustinTong0323
Copy link
Collaborator

Local e2e test passed.
image

@zhyncs
Copy link
Member

zhyncs commented Oct 15, 2025

@JustinTong0323 may u fix the ci failures, thanks!

@JustinTong0323
Copy link
Collaborator

@JustinTong0323 may u fix the ci failures, thanks!

CI failures seems irrelevant, maybe breaks in main by #11561

@zhyncs
Copy link
Member

zhyncs commented Oct 15, 2025

[2025-10-15 06:58:11] Received sigquit from a child process. It usually means the child failed.
[2025-10-15 06:58:11 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/managers/scheduler.py", line 3040, in run_scheduler_process
    scheduler = Scheduler(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/managers/scheduler.py", line 418, in __init__
    self.tp_worker = TpModelWorker(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/managers/tp_worker.py", line 95, in __init__
    self.model_runner = ModelRunner(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 322, in __init__
    self.initialize(min_per_gpu_memory)
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 389, in initialize
    self.load_model()
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 852, in load_model
    self.model = get_model(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_loader/__init__.py", line 28, in get_model
    return loader.load_model(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_loader/loader.py", line 569, in load_model
    model = _initialize_model(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/model_loader/loader.py", line 247, in _initialize_model
    return model_class(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/models/qwen3_next.py", line 899, in __init__
    self.model = Qwen3NextModel(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/models/qwen3_next.py", line 833, in __init__
    self.layers = make_layers(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/utils/common.py", line 515, in make_layers
    + get_offloader().wrap_modules(
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/utils/offloader.py", line 36, in wrap_modules
    return list(all_modules_generator)
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/utils/common.py", line 517, in <genexpr>
    layer_fn(idx=idx, prefix=add_prefix(idx, prefix))
  File "/public_sglang_ci/runner-l3a-gpu-4567/_work/sglang/sglang/python/sglang/srt/models/qwen3_next.py", line 824, in get_layer
    layer_class = ALL_DECODER_LAYER_TYPES[config.layers_block_type[idx]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 207, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'Qwen3NextConfig' object has no attribute 'layers_block_type'

@zhyncs
Copy link
Member

zhyncs commented Oct 15, 2025

@JustinTong0323
Copy link
Collaborator

JustinTong0323 commented Oct 16, 2025

MMMU Acc:

  • 30b omni: 0.527
  • 4b vl (dense): 0.472
  • 30b vl (moe): 0.488

@zhyncs
Copy link
Member

zhyncs commented Oct 16, 2025

@zhyncs
Copy link
Member

zhyncs commented Oct 16, 2025

the AMD failure is not related to this pr

ref #11488

@zhyncs
Copy link
Member

zhyncs commented Oct 16, 2025

All NVIDIA PR Tests passed :)

@zhyncs zhyncs merged commit 86b04d2 into main Oct 16, 2025
196 of 222 checks passed
@zhyncs zhyncs deleted the qwen3-omni branch October 16, 2025 20:20
@yuan-luo yuan-luo mentioned this pull request Oct 24, 2025
4 tasks
leavelet pushed a commit to pkucnc/sglang that referenced this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants