Skip to content

Commit 6152e7e

Browse files
authored
Update the preview image for modelopt quantization blog and minor format fix (#266)
* update the preview image for modelopt quantization blog Signed-off-by: Zhiyu Cheng <[email protected]> * minor Signed-off-by: Zhiyu Cheng <[email protected]> --------- Signed-off-by: Zhiyu Cheng <[email protected]>
1 parent 9a872a0 commit 6152e7e

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

blog/2025-12-02-modelopt-quantization.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Boost SGLang Inference: Native NVIDIA Model Optimizer Integration for Seamless Quantization and Deployment"
33
author: "NVIDIA ModelOpt Team"
44
date: "Dec 02, 2025"
5-
previewImg: /images/blog/nvidia-modelopt-quantization/DSR1-nvfp4-perf.jpg
5+
previewImg: /images/blog/nvidia-modelopt-quantization/Preview-modelopt-integration.png
66
---
77

88
(Updated on Dec 2)
@@ -22,11 +22,11 @@ SGLang now integrates NVIDIA's Model Optimizer directly, allowing you to call it
2222

2323
This new capability unlocks a simple, three-step workflow:
2424

25-
- **Quantize**: Use the new SGLang-ModelOpt interface to apply state-of-the-art quantization techniques that enable accelerated low-precision inference in NVFP4, MXFP4, FP8, etc.
25+
1. **Quantize**: Use the new SGLang-ModelOpt interface to apply state-of-the-art quantization techniques that enable accelerated low-precision inference in NVFP4, MXFP4, FP8, etc.
2626

27-
- **Export**: Save the optimized model artifacts, now fully compatible with the SGLang runtime.
27+
2. **Export**: Save the optimized model artifacts, now fully compatible with the SGLang runtime.
2828

29-
- **Deploy**: Load the quantized model directly into the SGLang runtime and serve it on NVIDIA platforms, immediately benefiting from lower latency and reduced memory usage.
29+
3. **Deploy**: Load the quantized model directly into the SGLang runtime and serve it on NVIDIA platforms, immediately benefiting from lower latency and reduced memory usage.
3030

3131

3232
#### Performance Outcomes
@@ -126,7 +126,7 @@ This native Model Optimizer integration reinforces SGLang's commitment to provid
126126

127127
We can't wait to see the performance gains you achieve with this new feature. Head over to our [GitHub repository](https://github.com/sgl-project/sglang) to pull the latest version and try it out!
128128

129-
Also, please join our dedicated Slack channel [#modelopt](https://sgl-fru7574.slack.com/archives/C09NPJSBR32) to discuss topics such as modelopt, quantization, and low-precision numerics! If you haven’t joined our workspace yet, you can join it first [here] (https://slack.sglang.io).
129+
Also, please join our dedicated Slack channel [#modelopt](https://sgl-fru7574.slack.com/archives/C09NPJSBR32) to discuss topics such as modelopt, quantization, and low-precision numerics! If you haven’t joined our workspace yet, you can join it first [here](https://slack.sglang.io).
130130

131131

132132
### Acknowledgement

0 commit comments

Comments
 (0)