Support Phi-4 Multi-Modal (text + vision only) #6494

lifuhuang · 2025-05-21T08:19:07Z

Motivation

Support Phi4-MM model with text + vision.

Modifications

This change introduced the basic text + image support.

It's worth noting that the current MMMU run (without LoRA) is lower than advertised because Phi4MM relies on LoRA for full image understanding capabilities. However, LoRA support requires refactoring / generalizing the existing SGL LoRA handling, which will hopefully be addressed in this separate PR: #6585

Example: degraded image understanding without LoRA (MMMU is only 38). As comparison in our local branch (#6585) with LoRA, MMMU is boosted to ~50:

TODO in this PR:

add unit tests
clean-up styling issues

TODO in follow-up PR (ordered by priority):

Precomputed feature support.
LoRA support (required for multi-image understanding)
SGLang LoRA compatibility with CUDA Graph and Radix Attention
Refactor SGL MM processor logic support for support the original token variable image token (e.g., <image_1>)
perf optimization
audio support
pipeline parallelism support

Tracked in #6544

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

zhaochenyang20 · 2025-05-23T04:26:16Z

@mickqian @yizhang2077

python/sglang/srt/models/phi4mmvllm.py

mickqian · 2025-05-23T05:50:55Z

Better to be merged after #4969 , due to some change to the omni model processing and testing

lifuhuang · 2025-05-23T06:29:55Z

Better to be merged after #4969 , due to some change to the omni model processing and testing

Hi @mickqian, thank you so much for reviewing my PR :)

Can you share more details about the concern you have such that I can test them locally? JFYI, I was able to merge your branch mickqian:qwen2.5-omni locally without conflict and get a green TestOpenAIVisionServer run for phi4mm.

mickqian · 2025-05-23T07:23:22Z

Can you share more details

~~It's mostly that, for omni models, there's a new TestOpenaiOmniServer. And yes, you can cherry-pick it.~~

I just noticed audio input is not supported this time.

python/sglang/srt/conversation.py

test/srt/test_vision_openai_server_b.py

python/sglang/srt/models/phi4mmvllm.py

JustinTong0323 self-assigned this May 21, 2025

lifuhuang marked this pull request as ready for review May 23, 2025 03:32

lifuhuang requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy, xiezhq-hermann, zhaochenyang20 and zhyncs as code owners May 23, 2025 03:32

lifuhuang requested review from BBuf, FlamingoPg, HaiShaw, HandH1998, ch-wan, kssteven418, rkooo567, slin1237 and yizhang2077 as code owners May 23, 2025 03:37

Squash all changes.

5dec8db

lifuhuang force-pushed the phi4mm branch from bd0fec3 to 5dec8db Compare May 23, 2025 03:53

lifuhuang added 2 commits May 23, 2025 03:55

Restore files.

1d23b4a

Add TODO tags for follow-ups.

531ef2a

lifuhuang mentioned this pull request May 23, 2025

[Feature] Phi-4-MM support #6544

Closed

7 tasks

mickqian reviewed May 23, 2025

View reviewed changes

python/sglang/srt/models/phi4mmvllm.py Outdated Show resolved Hide resolved

lifuhuang added 2 commits May 23, 2025 05:58

Refactor Phi4MMForCausalLM to use composition instead of inheritance.

8ad79da

Fix typo.

e97b35e

Merge branch 'main' into phi4mm

ad24434

mickqian reviewed May 23, 2025

View reviewed changes

python/sglang/srt/conversation.py Outdated Show resolved Hide resolved

test/srt/test_vision_openai_server_b.py Show resolved Hide resolved

python/sglang/srt/models/phi4mmvllm.py Outdated Show resolved Hide resolved

lifuhuang added 2 commits May 24, 2025 00:35

Address PR comments.

e57cefb

Merge branch 'main' into phi4mm

3d3c997

lifuhuang requested a review from mickqian May 24, 2025 00:36

lifuhuang added 2 commits May 24, 2025 00:53

Fix lint

fed84c6

Merge branch 'main' into phi4mm

85a57cd

mickqian approved these changes May 25, 2025

View reviewed changes

lifuhuang and others added 2 commits May 24, 2025 20:14

Merge branch 'main' into phi4mm

06f3567

Merge branch 'main' into phi4mm

e16eb71

zhyncs approved these changes May 25, 2025

View reviewed changes

zhyncs merged commit 022012a into sgl-project:main May 25, 2025
1 of 19 checks passed

lifuhuang self-assigned this May 25, 2025

lifuhuang mentioned this pull request May 26, 2025

follow-up: move Idefics2 to a shared location to eliminate unexpected dependency. #6603

Merged

6 tasks

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

Support Phi-4 Multi-Modal (text + vision only) (sgl-project#6494)

0960f32

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

Support Phi-4 Multi-Modal (text + vision only) (sgl-project#6494)

db0494d

lifuhuang mentioned this pull request Jun 23, 2025

Development Roadmap (2025 H1) #4042

Closed

67 tasks

lifuhuang added new-model Multi-modal multi-modal language model labels Jul 14, 2025

byjiang1996 mentioned this pull request Jul 18, 2025

Feat: Support audio in Phi4-mm model #8048

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Phi-4 Multi-Modal (text + vision only) #6494

Support Phi-4 Multi-Modal (text + vision only) #6494

Uh oh!

lifuhuang commented May 21, 2025 •

edited

Loading

Uh oh!

zhaochenyang20 commented May 23, 2025

Uh oh!

Uh oh!

mickqian commented May 23, 2025

Uh oh!

lifuhuang commented May 23, 2025

Uh oh!

mickqian commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support Phi-4 Multi-Modal (text + vision only) #6494

Support Phi-4 Multi-Modal (text + vision only) #6494

Uh oh!

Conversation

lifuhuang commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

zhaochenyang20 commented May 23, 2025

Uh oh!

Uh oh!

mickqian commented May 23, 2025

Uh oh!

lifuhuang commented May 23, 2025

Uh oh!

mickqian commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lifuhuang commented May 21, 2025 •

edited

Loading

mickqian commented May 23, 2025 •

edited

Loading