Qwen3.5 seems have problem with tp>1 in triton_attention #20106

gaoxt1983 · 2026-03-08T02:23:55Z

gaoxt1983
Mar 8, 2026

I'm using Qwen3.5 with 2 L40-48G. the configuration is like:

{
"log_requests": false,
"enable_cache_report": true,
"mem_fraction_static": 0.85,
"max_running_requests": 64,
"max_prefill_tokens": 8192,
"chunked_prefill_size": 4096,
"tp_size": 2,
"show_time_cost": true,
"enable_metrics": true,
"tool_call_parser": "qwen3_coder",
"trust_remote_code": true,
"reasoning_parser": "qwen3",
"nnodes": 1,
"served_model_name": "duokabushu2"
}

But when inference, the generated text is very short and seems unfinished. I try to read the log which sglang generated. found that there is only prefill phrase, no decode phrase. Is there something wrong with my configuration or my image?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5 seems have problem with tp>1 in triton_attention #20106

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Qwen3.5 seems have problem with tp>1 in triton_attention #20106

Uh oh!

gaoxt1983 Mar 8, 2026

Replies: 0 comments

gaoxt1983
Mar 8, 2026