When running multiple kvcached instances under heavy concurrent load, one instance can crash with a fatal AssertionError in
ElasticBlockPool.get_new_blocks().
The root cause is that available_size() (used by vllm's scheduler to check if blocks are available) and the actual alloc()
call are not atomic with respect to the shared physical pool. Another instance can consume physical pages between the check
and the allocation, causing alloc() to return None and the subsequent assert block_ids is not None to fire, killing the
EngineCore.
Environment
- GPU: AMD MI300X (192 GB)
- vLLM: 0.14.0
- kvcached: (repo main)
- Setup: 6× Qwen2.5-7B-Instruct instances, kvcached_gpu_utilization=0.90
Steps to Reproduce
- Launch 6 vllm instances sharing a kvcached physical pool
- Run a staggered load sweep with long completions (e.g. completion_len=2048, peak_rps=20) across all instances
- When multiple instances are simultaneously draining heavy backlogs (high KV cache usage), one instance crashes
Logs (see attached):
[kvcached][WARNING] kv_cache_manager.py:174 available_size()=71 < need_size=76
ERROR core.py:938 EngineCore encountered a fatal error.
...
scheduler.schedule()
→ kv_cache_manager.allocate_slots()
→ coordinator.allocate_new_blocks()
→ block_pool.get_new_blocks(num_new_blocks) ← AssertionError
Expected behavior:
When alloc() returns None, the engine should handle it gracefully rather than crash.
Notes:
The reserved_page_list mechanism partially mitigates this race for pre-mapped pages, but does not cover cases where the
needed allocation exceeds the reservation buffer.

When running multiple kvcached instances under heavy concurrent load, one instance can crash with a fatal AssertionError in
ElasticBlockPool.get_new_blocks().
The root cause is that available_size() (used by vllm's scheduler to check if blocks are available) and the actual alloc()
call are not atomic with respect to the shared physical pool. Another instance can consume physical pages between the check
and the allocation, causing alloc() to return None and the subsequent assert block_ids is not None to fire, killing the
EngineCore.
Environment
Steps to Reproduce
Logs (see attached):
[kvcached][WARNING] kv_cache_manager.py:174 available_size()=71 < need_size=76
ERROR core.py:938 EngineCore encountered a fatal error.
...
scheduler.schedule()
→ kv_cache_manager.allocate_slots()
→ coordinator.allocate_new_blocks()
→ block_pool.get_new_blocks(num_new_blocks) ← AssertionError
Expected behavior:
When alloc() returns None, the engine should handle it gracefully rather than crash.
Notes:
The reserved_page_list mechanism partially mitigates this race for pre-mapped pages, but does not cover cases where the
needed allocation exceeds the reservation buffer.