Using Prefill node idle cycles for Decoding in PD disaggregation? #21992
Closed
usernamehaha2022
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’ve been deploying SGLang in PD disaggregated mode and frequently encounter a decoding bottleneck: a single prefill takes only about 300ms, while decoding takes over 1s. Since my QPS isn't very high, the Prefill nodes are often idle.
I tried switching back to a non-PD setup, but the E2E latency for the same traffic (across two colocated nodes) is still significantly higher than the PD setup.
So, without implementing full "dynamic PD role switching," is it possible to utilize the idle time of Prefill nodes to handle some decoding batches?
Beta Was this translation helpful? Give feedback.
All reactions