Does vllm disaggregated prefilling supports to run multiple model in one single host (with 8 GPUs) ? #14615

Colstuwjx · 2025-03-11T13:52:53Z

Colstuwjx
Mar 11, 2025

Hi team, I've seen the excellent disaggregated prefilling feature and I've also checked the test script. I'm wondering if it's possible to reuse one single host which has 8 H20 GPU cards and deploy multiple llm models with disaggregated prefilling mode, e.g.:

- CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct ... "kv_role":"kv_producer" ...
- CUDA_VISIBLE_DEVICES=1 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct ... "kv_role":"kv_consumer" ...
- CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct ... "kv_role":"kv_consumer" ...
- CUDA_VISIBLE_DEVICES=3 vllm serve Deepseek-R1 ... "kv_role":"kv_producer" ...
- CUDA_VISIBLE_DEVICES=4 vllm serve Deepseek-R1 ... "kv_role":"kv_consumer" ...
- CUDA_VISIBLE_DEVICES=5 vllm serve Deepseek-R1 ... "kv_role":"kv_consumer" ...
- CUDA_VISIBLE_DEVICES=6 vllm serve QWen-32B ... "kv_role":"kv_producer" ...
- CUDA_VISIBLE_DEVICES=7 vllm serve QWen-32B ... "kv_role":"kv_consumer" ...

Is this possible to do so, and also is it a best practice ?
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does vllm disaggregated prefilling supports to run multiple model in one single host (with 8 GPUs) ? #14615

{{title}}

Replies: 0 comments

Select a reply

Does vllm disaggregated prefilling supports to run multiple model in one single host (with 8 GPUs) ? #14615

Colstuwjx Mar 11, 2025

Replies: 0 comments

Colstuwjx
Mar 11, 2025