You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -22,11 +22,11 @@ SGLang now integrates NVIDIA's Model Optimizer directly, allowing you to call it
22
22
23
23
This new capability unlocks a simple, three-step workflow:
24
24
25
-
-**Quantize**: Use the new SGLang-ModelOpt interface to apply state-of-the-art quantization techniques that enable accelerated low-precision inference in NVFP4, MXFP4, FP8, etc.
25
+
1.**Quantize**: Use the new SGLang-ModelOpt interface to apply state-of-the-art quantization techniques that enable accelerated low-precision inference in NVFP4, MXFP4, FP8, etc.
26
26
27
-
-**Export**: Save the optimized model artifacts, now fully compatible with the SGLang runtime.
27
+
2.**Export**: Save the optimized model artifacts, now fully compatible with the SGLang runtime.
28
28
29
-
-**Deploy**: Load the quantized model directly into the SGLang runtime and serve it on NVIDIA platforms, immediately benefiting from lower latency and reduced memory usage.
29
+
3.**Deploy**: Load the quantized model directly into the SGLang runtime and serve it on NVIDIA platforms, immediately benefiting from lower latency and reduced memory usage.
30
30
31
31
32
32
#### Performance Outcomes
@@ -126,7 +126,7 @@ This native Model Optimizer integration reinforces SGLang's commitment to provid
126
126
127
127
We can't wait to see the performance gains you achieve with this new feature. Head over to our [GitHub repository](https://github.com/sgl-project/sglang) to pull the latest version and try it out!
128
128
129
-
Also, please join our dedicated Slack channel [#modelopt](https://sgl-fru7574.slack.com/archives/C09NPJSBR32) to discuss topics such as modelopt, quantization, and low-precision numerics! If you haven’t joined our workspace yet, you can join it first [here](https://slack.sglang.io).
129
+
Also, please join our dedicated Slack channel [#modelopt](https://sgl-fru7574.slack.com/archives/C09NPJSBR32) to discuss topics such as modelopt, quantization, and low-precision numerics! If you haven’t joined our workspace yet, you can join it first [here](https://slack.sglang.io).
0 commit comments