Mistral Large 3 NVFP4 support #14485

dcampora · 2025-12-05T07:01:21Z

Support Mistral Large 3 NVFP4.

Depends on #14466.

GSM8K test results:

SGLANG_ENABLE_JIT_DEEPGEMM=0 \
python3 -m sglang.launch_server \
--model mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 \
--kv-cache-dtype fp8_e4m3 \
--tensor-parallel-size 8 \
--disable-radix-cache \
--stream-interval 20 \
--mem-fraction-static 0.9 \
--attention-backend trtllm_mla \
--model-loader-extra-config '{"enable_multithread_load": true}' \
--max-running-requests 1024 \
--cuda-graph-max-bs 1024 \
--chat-template mistral

lm_eval \
--model local-chat-completions \
--model_args model=mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4,\
base_url=http://127.0.0.1:30000/v1/chat/completions,\
num_concurrent=128,timeout=999999,max_gen_toks=8192 \
--tasks gsm8k \
--batch_size 128 \
--apply_chat_template \
--num_fewshot 8

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     8|exact_match|↑  |0.9249|±  |0.0073|
|     |       |strict-match    |     8|exact_match|↑  |0.7104|±  |0.0125|

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

Signed-off-by: Linda-Stadter <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

gemini-code-assist · 2025-12-05T07:01:25Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: Xinyuan Tong <[email protected]>

JustinTong0323 · 2025-12-05T20:10:04Z

/tag-and-rerun-ci

elvischenv · 2025-12-06T12:11:58Z

python/sglang/srt/layers/attention/flashinfer_ops.py

This is a mm op, why put under attention layer?

elvischenv · 2025-12-06T12:17:00Z

python/sglang/srt/layers/quantization/compressed_tensors/schemes/__init__.py


 from .compressed_tensors_scheme import CompressedTensorsScheme
+from .compressed_tensors_w4a4_nvfp4 import CompressedTensorsW4A4Fp4
+from .compressed_tensors_w4a16_nvfp4 import CompressedTensorsW4A16Fp4


Have we tested the w4a16 code path? If not, better to do it in another PR. We may need it on Hopper or previous arch, and we don't have w4a16 moe support for now.

elvischenv · 2025-12-06T12:18:05Z

.../sglang/srt/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py

Same comment, can do in follow-up PR.

elvischenv · 2025-12-06T12:23:13Z

python/sglang/srt/layers/quantization/compressed_tensors/compressed_tensors.py


        if is_activation_quantization_format(self.quant_format):
+            if self._is_fp4a4_nvfp4(weight_quant, input_quant):
+                if cutlass_fp4_supported():


w4a4 supports both flashinfer and cutlass, right? I think we should do something similar to the below method, check the capability.

elvischenv · 2025-12-06T12:26:50Z

python/sglang/srt/layers/quantization/marlin_utils_fp4.py

This seems to be only used by w4a16.

elvischenv · 2025-12-06T12:29:27Z

python/sglang/srt/layers/quantization/utils.py

    )
+
+
+def swizzle_blockscale(scale: torch.Tensor):


Add a comment to clarify this method is nvfp4 specific.

elvischenv and others added 7 commits December 4, 2025 07:01

Support eagle

d6322e0

Fixes for running eagle

0c9e202

Signed-off-by: Linda-Stadter <[email protected]>

Merge branch 'main' into elvis/eagle

0575231

Added w4a16 loading support.

1402ae4

Signed-off-by: Daniel Campora <[email protected]>

Adding w4a4 support for compressed tensors.

b8b4cc6

Signed-off-by: Daniel Campora <[email protected]>

Do not change sgl kernel.

c54fc52

Signed-off-by: Daniel Campora <[email protected]>

add compressed tensors w4a4 nvfp4 moe support

ab9fe7a

dcampora requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners December 5, 2025 07:01

github-actions bot added quant LLM Quantization deepseek blackwell SM100/SM120 labels Dec 5, 2025

dcampora changed the title ~~Mistral Large 3 Eagle and NVFP4 support~~ Mistral Large 3 NVFP4 support Dec 5, 2025

JustinTong0323 and others added 4 commits December 5, 2025 15:47

Merge branch 'main' into dcampora/nvfp4_support

09d4eba

lint

fdf7a4e

Signed-off-by: Xinyuan Tong <[email protected]>

fix marlin undefined name

6b67c6c

Signed-off-by: Xinyuan Tong <[email protected]>

Merge branch 'main' into dcampora/nvfp4_support

1c5b478

github-actions bot added the run-ci label Dec 5, 2025

JustinTong0323 added 2 commits December 5, 2025 21:35

Merge branch 'main' into dcampora/nvfp4_support

c4b3399

Merge branch 'main' into dcampora/nvfp4_support

da20243

elvischenv reviewed Dec 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mistral Large 3 NVFP4 support #14485

Mistral Large 3 NVFP4 support #14485

dcampora commented Dec 5, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

JustinTong0323 commented Dec 5, 2025

Uh oh!

elvischenv Dec 6, 2025

Uh oh!

elvischenv Dec 6, 2025

Uh oh!

elvischenv Dec 6, 2025

Uh oh!

elvischenv Dec 6, 2025

Uh oh!

elvischenv Dec 6, 2025

Uh oh!

elvischenv Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Mistral Large 3 NVFP4 support #14485

Are you sure you want to change the base?

Mistral Large 3 NVFP4 support #14485

Conversation

dcampora commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

JustinTong0323 commented Dec 5, 2025

Uh oh!

elvischenv Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

elvischenv Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

elvischenv Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

elvischenv Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

elvischenv Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

elvischenv Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dcampora commented Dec 5, 2025 •

edited

Loading