Skip to content

Conversation

@alisonshao
Copy link
Collaborator

Summary

  • Add nightly performance test for mistralai/Mistral-Large-3-675B-Instruct-2512
  • Test configuration:
    • TP=8
    • --attention-backend trtllm_mla
    • --model-loader-extra-config '{"enable_multithread_load": true}'
    • --chat-template mistral
    • SGLANG_ENABLE_JIT_DEEPGEMM=0

Test plan

  • Verify nightly CI workflow runs successfully on 8-gpu-h200 runner
  • Check performance traces are published correctly

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 5, 2025
Add nightly performance test for mistralai/Mistral-Large-3-675B-Instruct-2512 model with:
- TP=8
- trtllm_mla attention backend
- Multithread loading enabled
- Mistral chat template
- SGLANG_ENABLE_JIT_DEEPGEMM=0
@alisonshao alisonshao force-pushed the feature/mistral-large3-ci branch from 0bc2af2 to 0233520 Compare December 5, 2025 00:02
@alisonshao
Copy link
Collaborator Author

alisonshao commented Dec 5, 2025

- Move Mistral test to run immediately after dependency installation
- Add `if: always()` to DeepSeek v3.1 since it's no longer first
- Remove duplicate Mistral test from end of workflow

This allows faster debugging of the Mistral test and ensures other
tests continue running regardless of Mistral test outcome.
Multiple TP ranks calling find_local_hf_snapshot_dir simultaneously
would all detect corrupted/missing files and race to delete the same
cache directory. One rank would succeed, others would fail with
"No such file or directory" errors.

Fix by wrapping the validation and cleanup logic in a file lock
(using suffix "-validation" to avoid conflict with download lock).
After acquiring the lock, re-check if the snapshot dir still exists
since another process may have already cleaned it up.
Download the model before running the test to avoid server launch
timeout issues. The 120-minute timeout allows for full download on
first run. Subsequent runs will use the cached model and skip quickly.
if "SGLANG_ENABLE_JIT_DEEPGEMM" in os.environ:
del os.environ["SGLANG_ENABLE_JIT_DEEPGEMM"]

def test_bench_one_batch(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add accuracy test for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i can do that. What score threshold should we expect? How do we determine this value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just added test_accuracy_mgsm method using the same eval framework as other text models. Set initial threshold to 0.90 (placeholder). Will calibrate the threshold after observing actual model performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MGSM accuracy for mistralai/Mistral-Large-3-675B-Instruct-2512: 0.972
https://github.com/sgl-project/sglang/actions/runs/19961215892/job/57241902940#step:5:2471

# validating and cleaning up the same model cache simultaneously.
# This prevents race conditions where multiple ranks detect corruption
# and try to delete the same files at the same time.
with get_lock(model_name_or_path, cache_dir, suffix="-validation"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change for Mistral large 3 model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a general bug fix discovered while debugging the Mistral test. When model weights are corrupted/missing, all 8 TP ranks detect the issue simultaneously and race to delete the cache directory. One rank succeeds, others fail with "No such file or directory" errors. This was causing CI failures.
the fix adds a file lock so only one rank performs validation and cleanup at a time. other models just happened to have complete caches so the bug wasn't triggered.

Added test_accuracy_mgsm method using the same eval framework as other
text models. Set initial threshold to 0.90 (placeholder). Will calibrate
the threshold after observing actual model performance.
@ispobock ispobock merged commit 6628098 into main Dec 5, 2025
51 of 58 checks passed
@ispobock ispobock deleted the feature/mistral-large3-ci branch December 5, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants