You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Fix] fix resources limit error when apply speculative decoding and aclgraph (#2472)
### What this PR does / why we need it?
When both speculative decoding and aclgraph are applied, and
cudagraph_capture_sizes uses the default value, it will report that the
stream resources are insufficient.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@9c99e48
Signed-off-by: withHades <[email protected]>
0 commit comments