bug fix for Ascend npu #11407

ash-sigh · 2025-10-10T02:40:44Z

Motivation

Bug fix for Ascend npu

Modifications

use torch_npu.npu_scatter_nd_update_ instead of deprecated torch_npu._npu_reshape_and_cache. [Bug] [Ascend] Launching Qwen3-VL-30B-A3B-Instruct got operator error. #11374
run image_processor on CPU, because transformers has some limitations on Ascend.
fix npu graph index_head_dim AttributeError due to DS-V3.2 changes.Support DeepSeek V3.2 Exp #11061

Accuracy Tests

Server launch script

export HCCL_OP_EXPANSION_MODE="AIV"
#export CPU_AFFINITY_CONF=1,npu0:192-223,npu1:192-223,npu2:128-159,npu3:128-15
export STREAMS_PER_DEVICE=32

python -m sglang.launch_server \
    --model-path /model/Qwen3-VL-30B-A3B-Instruct/ \
    --tp-size 2  --device npu \
    --attention-backend ascend \
    --mm-attention-backend ascend_attn \
    --trust-remote-code

Accuracy test command

python3 -m sglang.test.few_shot_gsm8k --num-questions 200

Accuracy test result

Downloading from https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl to /tmp/test.jsonl
/tmp/test.jsonl: 732kB [00:07, 101kB/s]
100%|███████████████████████████████████████████████████████████████████| 200/200 [00:44<00:00, 4.51it/s]
Accuracy: 0.945
Invalid: 0.000
Latency: 44.533 s
Output throughput: 646.760 token/s

Benchmarking and Profiling

python -m sglang.bench_serving --backend sglang --num-prompt 10 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json

#Input tokens: 1997
#Output tokens: 2798
Starting warmup with 1 sequences...
Warmup completed with 1 sequences. Starting main benchmark run...
100%|███████████████████████████████████████████████| 10/10 [00:22<00:00, 2.20s/it]

============ Serving Benchmark Result ============
Backend: sglang
Traffic request rate: inf
Max request concurrency: not set
Successful requests: 10
Benchmark duration (s): 22.04
Total input tokens: 1997
Total input text tokens: 1997
Total input vision tokens: 0
Total generated tokens: 2798
Total generated tokens (retokenized): 2783
Request throughput (req/s): 0.45
Input token throughput (tok/s): 90.60
Output token throughput (tok/s): 126.94
Total token throughput (tok/s): 217.54
Concurrency: 5.93
----------------End-to-End Latency----------------
Mean E2E Latency (ms): 13079.63
Median E2E Latency (ms): 14950.25
---------------Time to First Token----------------
Mean TTFT (ms): 404.65
Median TTFT (ms): 403.89
P99 TTFT (ms): 450.20
---------------Inter-Token Latency----------------
Mean ITL (ms): 45.50
Median ITL (ms): 46.50
P95 ITL (ms): 50.18
P99 ITL (ms): 52.89
Max ITL (ms): 130.14
==================================================

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

…l-project#11061)

ash-sigh and others added 2 commits October 10, 2025 09:46

fix bug for Ascend npu

84a77b4

fix npu graph index_head_dim AttributeError due to DS-V3.2 changes(sg…

3c534de

…l-project#11061)

ash-sigh changed the title ~~Bug fix for Ascend npu~~ bug fix for Ascend npu Oct 10, 2025

ping1jing2 mentioned this pull request Oct 10, 2025

[Bug] [Ascend] Launching Qwen3-VL-30B-A3B-Instruct got operator error. #11374

Closed

5 tasks

transfer out_indices to int

338fe1f

ash-sigh closed this Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug fix for Ascend npu #11407

bug fix for Ascend npu #11407

Uh oh!

ash-sigh commented Oct 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bug fix for Ascend npu #11407

bug fix for Ascend npu #11407

Uh oh!

Conversation

ash-sigh commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ash-sigh commented Oct 10, 2025 •

edited

Loading