-
Notifications
You must be signed in to change notification settings - Fork 3.3k
model: qwen3-omni (thinker-only) #10911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @mickqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the system's capabilities by integrating the advanced Qwen3-Omni multimodal model, focusing on its 'thinker-only' functionality. The changes encompass the addition of new configuration structures for various model sub-components, updates to core multimodal processing utilities, and adaptations to rotary embedding calculations to support the model's diverse input types. Furthermore, existing Qwen3-VL model code has been refactored for improved modularity, and new test cases have been introduced to ensure robust operation of the integrated models. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the qwen3-omni model, a multimodal mixture-of-experts model. The changes include new configuration files, a detailed model implementation, and updates to the rotary embedding logic to handle image, video, and audio inputs. The PR also includes some beneficial refactoring of the existing qwen3-vl model code to improve extensibility and code reuse. My review focuses on cleaning up leftover development artifacts like debug prints and TODOs, pointing out a potential bug in the new rotary embedding logic, and suggesting improvements for documentation clarity and code style.
4a73b22 to
cb8b683
Compare
|
Will the talker be included in the future? |
Yes |
Could you share the planned timeline or support schedule for the talker? |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the qwen3-omni model, which is a significant undertaking involving new model configurations, model implementations, and updates to multimodal processing logic. The changes are extensive and well-structured, particularly the refactoring in qwen3_vl.py to accommodate the new omni-modal model. My review focuses on improving code clarity, removing dead code, and suggesting minor performance optimizations. Overall, this is a solid contribution.
| if code_predictor_config is None: | ||
| code_predictor_config = {} | ||
| self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig() | ||
| logger.info( | ||
| "code_predictor_config is None. Initializing code_predictor_config model with default values" | ||
| ) | ||
| elif isinstance(code_predictor_config, Qwen3OmniMoeTalkerCodePredictorConfig): | ||
| self.code_predictor_config = code_predictor_config | ||
| else: | ||
| self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig( | ||
| **code_predictor_config | ||
| ) | ||
|
|
||
| if text_config is None: | ||
| text_config = {} | ||
| self.text_config = Qwen3OmniMoeTalkerTextConfig() | ||
| logger.info( | ||
| "talker text_config is None. Initializing talker text model with default values" | ||
| ) | ||
| elif isinstance(text_config, Qwen3OmniMoeTalkerTextConfig): | ||
| self.text_config = text_config | ||
| else: | ||
| self.text_config = Qwen3OmniMoeTalkerTextConfig(**text_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization logic for code_predictor_config and text_config can be simplified. The lines ..._config = {} are redundant when the config is None, as self.<sub_config> is immediately initialized with a default config object. This makes the code slightly cleaner.
if code_predictor_config is None:
self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig()
logger.info(
"code_predictor_config is None. Initializing code_predictor_config model with default values"
)
elif isinstance(code_predictor_config, Qwen3OmniMoeTalkerCodePredictorConfig):
self.code_predictor_config = code_predictor_config
else:
self.code_predictor_config = Qwen3OmniMoeTalkerCodePredictorConfig(
**code_predictor_config
)
if text_config is None:
self.text_config = Qwen3OmniMoeTalkerTextConfig()
logger.info(
"talker text_config is None. Initializing talker text model with default values"
)
elif isinstance(text_config, Qwen3OmniMoeTalkerTextConfig):
self.text_config = text_config
else:
self.text_config = Qwen3OmniMoeTalkerTextConfig(**text_config)| ed_vision_start = ( | ||
| input_tokens.index(vision_start_token_id, st) | ||
| if ( | ||
| ( | ||
| image_token_id in input_tokens | ||
| or video_token_id in input_tokens | ||
| ) | ||
| and (remain_videos > 0 or remain_images > 0) | ||
| ) | ||
| else len(input_tokens) + 1 | ||
| ) | ||
| ed_audio_start = ( | ||
| input_tokens.index(audio_start_token_id, st) | ||
| if (audio_token_id in input_tokens and remain_audios > 0) | ||
| else len(input_tokens) + 1 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling list.index() inside a loop can be inefficient, as it performs a linear scan from the start index st in each iteration. For better performance, you could find all occurrences of the special tokens (vision_start_token_id, audio_start_token_id) once before the loop and then iterate through those found indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes may conflict with #11062 . Shall we first merge that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should merge that one first
| @@ -1269,6 +1287,304 @@ def get_rope_index( | |||
| mrope_position_deltas = max_position_ids + 1 - s | |||
| return position_ids, mrope_position_deltas | |||
|
|
|||
| @staticmethod | |||
| def get_rope_index_qwen3_omni( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change could make the file extremely long.. Could we use an isolated mrope py file to put these helper functions? ref: https://docs.sglang.ai/developer_guide/contribution_guide.html#general-code-style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in another PR
|
Test MMMU |
|
@JustinTong0323 may u fix the ci failures, thanks! |
CI failures seems irrelevant, maybe breaks in main by #11561 |
|
|
MMMU Acc:
|
|
the AMD failure is not related to this pr ref #11488 |
|
All NVIDIA PR Tests passed :) |
Co-authored-by: Xinyuan Tong <[email protected]>

Motivation
solve #11343
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist