Using Prefill node idle cycles for Decoding in PD disaggregation? #21992

usernamehaha2022 · 2026-04-03T02:48:53Z

usernamehaha2022
Apr 3, 2026

I’ve been deploying SGLang in PD disaggregated mode and frequently encounter a decoding bottleneck: a single prefill takes only about 300ms, while decoding takes over 1s. Since my QPS isn't very high, the Prefill nodes are often idle.

I tried switching back to a non-PD setup, but the E2E latency for the same traffic (across two colocated nodes) is still significantly higher than the PD setup.

So, without implementing full "dynamic PD role switching," is it possible to utilize the idle time of Prefill nodes to handle some decoding batches?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Prefill node idle cycles for Decoding in PD disaggregation? #21992

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Using Prefill node idle cycles for Decoding in PD disaggregation? #21992

Uh oh!

usernamehaha2022 Apr 3, 2026

Replies: 0 comments

usernamehaha2022
Apr 3, 2026