[Ascend] Add Ascend NPU support for sglang.check_env & rework proposal #11052

Alexhaoge · 2025-09-29T06:45:12Z

Motivation

The first commit adds Ascend NPU support for sglang.check_env script that allow users to export their environments for raising Github issues or other diagnostic purposes.

The second commit refactors the script, which allows vendors to write difference version check procedures without breaking others' implemetation while avoiding naming confusion.

Modifications

We provide two implementations and request decision from the community.

Legacy approach

commit 1624b08
Add additional branches on the existing functions to get device information and driver/complier/toolkit version. Details are as follows,

get_cuda_info:
- Add a new function to output device names due to torch_npu interface differences.
- Lazy add torch_npu to PACKAGE_LIST to avoid unnecessary package check for other hardwares.
_get_cuda_version_info: Use multiple environment variable to locate CANN path, fall back to default installation path if none of them works.
_get_nvcc_info:
- Find CANN toolkit version in $CANN_HOME/version.cfg. The toolkit versioning rule is different with Bisheng compiler (unlike CUDA).
- Find Bisheng complier under CANN path and output the first line of its version.
_get_cuda_driver_version: Use npu-smi info -t board -i 0 to get driver version since driver applies for all NPUs in a single server.

Class-based refractoring

commit 0acff70
As @Alcanderian pointed out, function name like _get_cuda_version_info can be ambiguous for inlcuding multiple hardware types, so we propose to rework the script as follows,

BaseEnv: create a base environment checker class and move common helper functions into it, like get_package_versions, get_device_info, get_hypervisor_vendor
Each hardware type will have a checker subclass. Subclasses should implement get_info(get cuda info) and get_topo (get device topology).
Dispatch env checker in __main__ and call BaseEnv.check_env() to print the enviroment info.

Accuracy Tests

Script output using Atlas 800T A2 server with main-910b docker image.

root@w25:/home# python -m sglang.check_env
/usr/local/python3.11.13/lib/python3.11/site-packages/torch/cuda/init.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
/usr/local/python3.11.13/lib/python3.11/site-packages/torch/cuda/init.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
Python: 3.11.13 (main, Jul 26 2025, 07:27:32) [GCC 11.4.0]
NPU available: True
NPU 0,1,2,3,4,5,6,7: Ascend910B3
CANN_HOME: /usr/local/Ascend/ascend-toolkit/latest
CANN: 8.2.0.0.201:8.2.RC1
BiSheng: 2025-07-23T11:24:13+08:00 clang version 15.0.5 (clang-5c68a1cb1231 flang-5c68a1cb1231)
Ascend Driver Version: 25.2.0
PyTorch: 2.6.0+cpu
torch_npu: 2.6.0.post1
sgl-kernel-npu: 0.1.0
sglang: 0.5.3.post3
sgl_kernel: Module Not Found
flashinfer_python: Module Not Found
triton: Module Not Found
transformers: 4.57.1
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.13.0
fastapi: 0.119.0
hf_transfer: 0.1.9
huggingface_hub: 0.35.3
interegular: 0.3.3
modelscope: 1.31.0
orjson: 3.11.3
outlines: 0.1.11
packaging: 25.0
psutil: 6.0.0
pydantic: 2.12.2
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.37.0
uvloop: 0.21.0
vllm: 0.8.5.post1+empty
xgrammar: 0.1.25
openai: 1.99.1
tiktoken: 0.12.0
anthropic: Module Not Found
litellm: Module Not Found
decord: Module Not Found
Ascend Topology:
NPU0 NPU1 NPU2 NPU3 NPU4 NPU5 NPU6 NPU7 CPU Affinity
NPU0 X HCCS HCCS HCCS HCCS HCCS HCCS HCCS 192-223
NPU1 HCCS X HCCS HCCS HCCS HCCS HCCS HCCS 192-223
NPU2 HCCS HCCS X HCCS HCCS HCCS HCCS HCCS 128-159
NPU3 HCCS HCCS HCCS X HCCS HCCS HCCS HCCS 128-159
NPU4 HCCS HCCS HCCS HCCS X HCCS HCCS HCCS 0-31
NPU5 HCCS HCCS HCCS HCCS HCCS X HCCS HCCS 0-31
NPU6 HCCS HCCS HCCS HCCS HCCS HCCS X HCCS 64-95
NPU7 HCCS HCCS HCCS HCCS HCCS HCCS HCCS X 64-95

Legend:

X = Self
SYS = Path traversing PCIe and NUMA nodes. Nodes are connected through SMP, such as QPI, UPI.
PHB = Path traversing PCIe and the PCIe host bridge of a CPU.
PIX = Path traversing a single PCIe switch
PXB = Path traversing multipul PCIe switches
HCCS = Connection traversing HCCS.
NA = Unknown relationship.

ulimit soft: 1073741816

Script output on H20 with lmsysorg/sglang:v0.5.3.post1-cu129-amd64 docker

python -m sglang.check_env
/usr/local/lib/pytho import pynvml Python: 3.12.11 (main, Jun CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 CUDA_HOME: /usr/local/cuda
NVCC: Cuda CUDA Driver Version: 550.127.08
PyTorch: 2.8.0+cu129
sglang: 0.5.3.post3
sgl_kernel: 0.3.15
flashinfer_python: 0.4.0
triton: 3.4.0
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.3
aiohttp: 3.13.0
fastapi: 0.118.2
hf_transfer: 0.1.9
huggingface_hub: 0.35.3
interegular: 0.3.3
modelscope: 1.30.0
orjson: 3.11.3
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.0
pydantic: 2.12.0
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.37.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.25
openai: 1.99.1
tiktoken: 0.12.0
anthropic: 0.69.0
litellm: Module Not Found
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU0 X GPU1 NV18 X GPU2 NV18 NV18 X GPU3 NV18 NV18 NV18 GPU4 NV18 GPU5 NV18 GPU6 NV18 GPU7 NV18 NIC0 NODE NIC1 NODE NODE PIX NIC2 NODE PIX NIC3 PIX NIC4 SYS NIC5 SYS NIC6 SYS NIC7 SYS NIC8 SYS NIC9 SYS n3.12/dist-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
# type: ignore[import]
4 2025, 08:56:18) [GCC 11.4.0]
Compute Capability: 9.0
compilation tools, release 12.9, V12.9.86
GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 CPU Affinity NUMA Affinity GPU NUMA ID
NV18 NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE NODE PIX SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE PIX NODE SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
NV18 NV18 NV18 NV18 NV18 NODE PIX NODE NODE SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
X NV18 NV18 NV18 NV18 PIX NODE NODE NODE SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS NODE PIX NODE NODE NODE NODE 56-111,168-223 1 N/A
NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS PIX NODE NODE NODE NODE NODE 56-111,168-223 1 N/A
NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS NODE NODE NODE NODE NODE PIX 56-111,168-223 1 N/A
NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS NODE NODE NODE NODE PIX NODE 56-111,168-223 1 N/A
NODE NODE PIX SYS SYS SYS SYS X NODE NODE NODE SYS SYS SYS SYS SYS SYS
NODE SYS SYS SYS SYS NODE X NODE NODE SYS SYS SYS SYS SYS SYS
NODE NODE SYS SYS SYS SYS NODE NODE X NODE SYS SYS SYS SYS SYS SYS
NODE NODE NODE SYS SYS SYS SYS NODE NODE NODE X SYS SYS SYS SYS SYS SYS
SYS SYS SYS NODE PIX NODE NODE SYS SYS SYS SYS X NODE NODE NODE NODE NODE
SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS NODE X NODE NODE NODE NODE
SYS SYS SYS NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE X PIX NODE NODE
SYS SYS SYS NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE PIX X NODE NODE
SYS SYS SYS NODE NODE NODE PIX SYS SYS SYS SYS NODE NODE NODE NODE X NODE
SYS SYS SYS NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE NODE NODE NODE X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9

ulimit soft: 1048576

Benchmarking and Profiling

Not applicable

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

python/sglang/check_env.py

sgl-project#11052) Co-authored-by: ronnie_zheng <[email protected]>

Alexhaoge force-pushed the check_env branch 2 times, most recently from d53a613 to bdc520b Compare October 15, 2025 09:39

ping1jing2 marked this pull request as ready for review October 16, 2025 09:26

ping1jing2 requested review from Ying1123, hnyls2002, merrymercy and zhyncs as code owners October 16, 2025 09:26

ping1jing2 added the run-ci label Oct 16, 2025

ping1jing2 reviewed Oct 16, 2025

View reviewed changes