Add Mistral Large 3 to nightly CI tests #14459

alisonshao · 2025-12-05T00:00:46Z

Summary

Add nightly performance test for mistralai/Mistral-Large-3-675B-Instruct-2512
Test configuration:
- TP=8
- --attention-backend trtllm_mla
- --model-loader-extra-config '{"enable_multithread_load": true}'
- --chat-template mistral
- SGLANG_ENABLE_JIT_DEEPGEMM=0

Test plan

Verify nightly CI workflow runs successfully on 8-gpu-h200 runner
Check performance traces are published correctly

gemini-code-assist · 2025-12-05T00:00:50Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add nightly performance test for mistralai/Mistral-Large-3-675B-Instruct-2512 model with: - TP=8 - trtllm_mla attention backend - Multithread loading enabled - Mistral chat template - SGLANG_ENABLE_JIT_DEEPGEMM=0

alisonshao · 2025-12-05T00:12:26Z

testing at: https://github.com/sgl-project/sglang/actions/runs/19956632530/job/57227329802

testing (with accuracy test) at: https://github.com/sgl-project/sglang/actions/runs/19961215892/job/57241902940

…ackwell)

- Move Mistral test to run immediately after dependency installation - Add `if: always()` to DeepSeek v3.1 since it's no longer first - Remove duplicate Mistral test from end of workflow This allows faster debugging of the Mistral test and ensures other tests continue running regardless of Mistral test outcome.

Multiple TP ranks calling find_local_hf_snapshot_dir simultaneously would all detect corrupted/missing files and race to delete the same cache directory. One rank would succeed, others would fail with "No such file or directory" errors. Fix by wrapping the validation and cleanup logic in a file lock (using suffix "-validation" to avoid conflict with download lock). After acquiring the lock, re-check if the snapshot dir still exists since another process may have already cleaned it up.

Download the model before running the test to avoid server launch timeout issues. The 120-minute timeout allows for full download on first run. Subsequent runs will use the cached model and skip quickly.

ispobock · 2025-12-05T10:08:19Z

test/nightly/test_mistral_large3_perf.py

+        if "SGLANG_ENABLE_JIT_DEEPGEMM" in os.environ:
+            del os.environ["SGLANG_ENABLE_JIT_DEEPGEMM"]
+
+    def test_bench_one_batch(self):


can we also add accuracy test for it?

Yes, i can do that. What score threshold should we expect? How do we determine this value?

just added test_accuracy_mgsm method using the same eval framework as other text models. Set initial threshold to 0.90 (placeholder). Will calibrate the threshold after observing actual model performance.

testing at: https://github.com/sgl-project/sglang/actions/runs/19961215892/job/57241902940

MGSM accuracy for mistralai/Mistral-Large-3-675B-Instruct-2512: 0.972
https://github.com/sgl-project/sglang/actions/runs/19961215892/job/57241902940#step:5:2471

ispobock · 2025-12-05T10:09:44Z

python/sglang/srt/model_loader/weight_utils.py

+    # validating and cleaning up the same model cache simultaneously.
+    # This prevents race conditions where multiple ranks detect corruption
+    # and try to delete the same files at the same time.
+    with get_lock(model_name_or_path, cache_dir, suffix="-validation"):


Why do we need this change for Mistral large 3 model?

it's a general bug fix discovered while debugging the Mistral test. When model weights are corrupted/missing, all 8 TP ranks detect the issue simultaneously and race to delete the cache directory. One rank succeeds, others fail with "No such file or directory" errors. This was causing CI failures.
the fix adds a file lock so only one rank performs validation and cleanup at a time. other models just happened to have complete caches so the bug wasn't triggered.

.github/workflows/nightly-test-nvidia.yml

Added test_accuracy_mgsm method using the same eval framework as other text models. Set initial threshold to 0.90 (placeholder). Will calibrate the threshold after observing actual model performance.

alisonshao requested review from CatherineSue, Fridge003, JustinTong0323, Kangyan-Zhou, ispobock, merrymercy and slin1237 as code owners December 5, 2025 00:00

github-actions bot added the documentation Improvements or additions to documentation label Dec 5, 2025

Add Mistral Large 3 to nightly CI tests

0233520

Add nightly performance test for mistralai/Mistral-Large-3-675B-Instruct-2512 model with: - TP=8 - trtllm_mla attention backend - Multithread loading enabled - Mistral chat template - SGLANG_ENABLE_JIT_DEEPGEMM=0

alisonshao force-pushed the feature/mistral-large3-ci branch from 0bc2af2 to 0233520 Compare December 5, 2025 00:02

alisonshao mentioned this pull request Dec 5, 2025

Add Mistral Large 3 basic test to PR CI #14460

Merged

3 tasks

alisonshao added 6 commits December 4, 2025 16:17

Add CI register for nightly test

a42e2bf

Fix: Move Mistral-Large-3 test to B200 runner (trtllm_mla requires Bl…

c5a05ab

…ackwell)

Add pre-download step for Mistral-Large-3 model

5c770dd

Download the model before running the test to avoid server launch timeout issues. The 120-minute timeout allows for full download on first run. Subsequent runs will use the cached model and skip quickly.

Reduce pre-download timeout to 60 minutes

0248bae

ispobock reviewed Dec 5, 2025

View reviewed changes

alisonshao added 2 commits December 5, 2025 03:12

Add MGSM accuracy test for Mistral Large 3

a12f4c2

Added test_accuracy_mgsm method using the same eval framework as other text models. Set initial threshold to 0.90 (placeholder). Will calibrate the threshold after observing actual model performance.

Remove pre-download step (model is now cached)

6e2195c

ispobock approved these changes Dec 5, 2025

View reviewed changes

ispobock merged commit 6628098 into main Dec 5, 2025
51 of 58 checks passed

ispobock deleted the feature/mistral-large3-ci branch December 5, 2025 15:16

alisonshao mentioned this pull request Dec 6, 2025

[CI] Add Mistral Large 3 Eagle nightly performance test #14525

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Mistral Large 3 to nightly CI tests #14459

Add Mistral Large 3 to nightly CI tests #14459

Uh oh!

alisonshao commented Dec 5, 2025

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

alisonshao commented Dec 5, 2025 •

edited

Loading

Uh oh!

ispobock Dec 5, 2025

Uh oh!

alisonshao Dec 5, 2025

Uh oh!

alisonshao Dec 5, 2025

Uh oh!

alisonshao Dec 5, 2025

Uh oh!

alisonshao Dec 5, 2025

Uh oh!

ispobock Dec 5, 2025

Uh oh!

alisonshao Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Mistral Large 3 to nightly CI tests #14459

Add Mistral Large 3 to nightly CI tests #14459

Uh oh!

Conversation

alisonshao commented Dec 5, 2025

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

alisonshao commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ispobock Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

alisonshao Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

alisonshao Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

alisonshao Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

alisonshao Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

alisonshao Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alisonshao commented Dec 5, 2025 •

edited

Loading