Skip to content

WIP #23598

@iforgetmyname

Description

@iforgetmyname

基于SGLang的快速上手体验

第一步:软件硬件准备

硬件设备 CANN版本 torch_npu版本 SGLang版本 HDK
Atlas 800I A3 CANN 8.5.0 torch_npu-2.8.0.post3.dev20260131 cann8.5.0-a3-tbnb20260417 25.5

在体验之前,需确认固件/驱动已正确安装,可运行如下命令确认:

npu-smi info

第二步,安装软件

容器方式拉起镜像

可以使用如下命令,一键拉起 SGLang 容器镜像:

# A3
export IMAGE=swr.cn-southwest-2.myhuaweicloud.com/base_image/dockerhub/lmsysorg/sglang:cann8.5.0-a3-tbnb20260417

# A2
# export IMAGE=swr.cn-southwest-2.myhuaweicloud.com/base_image/dockerhub/lmsysorg/sglang:cann8.5.0-910b-tbnb20260417

docker run -itd --shm-size=128g --privileged=true --name sgl-a3 \
--privileged=true --net=host \
-v /mnt:/mnt \
-v /home:/home \
-v /data:/data \
-v /var/queue_schedule:/var/queue_schedule \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /usr/local/sbin:/usr/local/sbin \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
--device=/dev/davinci0:/dev/davinci0  \
--device=/dev/davinci1:/dev/avinci1  \
--device=/dev/davinci2:/dev/davinci2  \
--device=/dev/davinci3:/dev/davinci3  \
--device=/dev/davinci4:/dev/davinci4  \
--device=/dev/davinci5:/dev/davinci5  \
--device=/dev/davinci6:/dev/davinci6  \
--device=/dev/davinci7:/dev/davinci7  \
--device=/dev/davinci8:/dev/davinci8  \
--device=/dev/davinci9:/dev/davinci9  \
--device=/dev/davinci10:/dev/davinci10  \
--device=/dev/davinci11:/dev/davinci11  \
--device=/dev/davinci12:/dev/davinci12  \
--device=/dev/davinci13:/dev/davinci13  \
--device=/dev/davinci14:/dev/davinci14  \
--device=/dev/davinci15:/dev/davinci15  \
--device=/dev/davinci_manager:/dev/davinci_manager \
--device=/dev/hisi_hdc:/dev/hisi_hdc \
--entrypoint=bash \
${IMAGES_ID}

# 进入容器
docker exec -it ${CONTIAINER_NAME} bash

# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/bin/set_env.bash
source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/custom_transformer/bin/set_env.bash

第三步,启动推理服务

启动命令

# cpu高性能
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
sysctl -w vm.swappiness=0
sysctl -w kernel.numa_balancing=0
# 绑核
export SGLANG_SET_CPU_AFFINITY=1
# 设置PYTHONPATH
unset https_proxy
unset http_proxy
unset HTTPS_PROXY
unset HTTP_PROXY
unset ASCEND_LAUNCH_BLOCKING
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
export PATH="/usr/local/python3.11.13/bin:$PATH"
# source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/bin/set_env.bash
source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/custom_transformer/bin/set_env.bash
# 内存碎片
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export STREAMS_PER_DEVICE=32
# 网卡
export HCCL_SOCKET_IFNAME=lo
export GLOO_SOCKET_IFNAME=lo
# 通信buffer
# export SGLANG_DEEPEP_BF16_DISPATCH=0
export DEEP_NORMAL_MODE_USE_INT8_QUANT=1
export SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=16      # 16
export HCCL_BUFFSIZE=2000

export IS_DEEPSEEK_V4=1
#--------------------------------------------------------------------
# hc_pre hc_post
# export USE_FUSED_HC_PRE_PYPTO=1
# export USE_FUSED_HC_POST=1
# hc_pre hc_post
export USE_FUSED_HC_PRE_ASCENDC=1
export USE_FUSED_HC_POST_ASCENDC=1
# FA开关
# export USE_FUSED_MLA_PROLOG=1 当前开启会劣化
export ASCEND_USE_FIA=1
export USE_PA_DECODE=1
export USE_PA_PREFILL=1
export USE_FUSED_TRANSPOSE_BATCHMATMUL=1

# compressor
export USE_FUSED_COMPRESSOR=1
export LI_KV_DTYPE_INT8=1

# moe_topk
export USE_NPU_MOE_GATING_TOP_K=1

export DEEPEP_NORMAL_LONG_SEQ_ROUND=10
export DEEPEP_NORMAL_LONG_SEQ_PER_ROUND_TOKENS=1024
export DEEPEP_NORMAL_COMBINE_ENABLE_LONG_SEQ=1

#--------------------------------------------------------------------
export HCCL_OP_EXPANSION_MODE=AIV
# export HCCL_OP_EXPANSION_MODE=HOST
# export HCCL_DETERMINISTIC=true
# export CLOSE_MATMUL_K_SHIFT=1
export TASK_QUEUE_ENABLE=1

MODEL_PATH=xxx
export LD_LIBRARY_PATH=/usr/local/Ascend/cann-8.5.0/opp/vendors/customize/op_api/lib/:${LD_LIBRARY_PATH}

#export ASCEND_LAUNCH_BLOCKING=1
# export INF_NAN_MODE_FORCE_DISABLE=1

export SGLANG_WARMUP_TIMEOUT=3600
export SGLANG_ENABLE_SPEC_V2=1
export SGLANG_ENABLE_OVERLAP_PLAN_STREAM=1
export FORCE_DRAFT_MODEL_NON_QUANT=1

# export SGLANG_DEBUG_MEMORY_POOL=0
python3 -m sglang.launch_server --model-path ${MODEL_PATH} \
	    --page-size 128 \
	    --tp-size 16 \
		--trust-remote-code \
		--attention-backend ascend \
		--device npu \
		--watchdog-timeout 9000 \
		--host 127.0.0.1 --port 6688 \
		--mem-fraction-static 0.8 \
		--disable-radix-cache --chunked-prefill-size -1 --max-prefill-tokens 65535 --context-length 65535 \
		--max-running-requests 32 \
		--disable-overlap-schedule \
		--dp-size 16 --enable-dp-attention \
		--moe-a2a-backend deepep --deepep-mode auto \
		--quantization modelslim --enable-dp-lm-head \
		--skip-server-warmup 2>&1 | tee launch.log &
curl --location 'http://127.0.0.1:6699/generate' --header 'Content-Type: application/json' --data '{  "text": "The capital of China is",  "sampling_params": {    "temperature": 0,    "max_new_tokens": 64  }}'

预期结果

"text":" Beijing. It is located in the northern part of the country and serves as the political, cultural, and educational center of China. Beijing is one of the four municipalities under the direct administration of the central government, along with Shanghai, Tianjin, and Chongqing.\n\nBeijing has a rich history dating back over 3,000 years. It has been the capital of several dynasties, including the Yuan, Ming, and Qing dynasties. The city is known for its historical landmarks, such as the Great Wall, the Forbidden City, and the Temple of Heaven.\n\nIn addition to its historical significance, Beijing is also a modern metropolis with a population of over 21 million people. It is home to many important institutions, including the Chinese government, the Communist Party of China, and the People's Liberation Army.\n\nBeijing is also a major transportation hub, with an extensive network of highways, railways, and airports. The city is served by Beijing Capital International Airport, which is one of the busiest",

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions