EngineCore crashes when physical KV pool is exhausted under multi-instance load

                                                                                                                                                                                                                                                              
  When running multiple kvcached instances under heavy concurrent load, one instance can crash with a fatal AssertionError in  
  ElasticBlockPool.get_new_blocks().

  The root cause is that available_size() (used by vllm's scheduler to check if blocks are available) and the actual alloc()
  call are not atomic with respect to the shared physical pool. Another instance can consume physical pages between the check
  and the allocation, causing alloc() to return None and the subsequent assert block_ids is not None to fire, killing the
  EngineCore.

  **Environment**

  - GPU: AMD MI300X (192 GB)
  - vLLM: 0.14.0
  - kvcached: (repo main)
  - Setup: 6× Qwen2.5-7B-Instruct instances, kvcached_gpu_utilization=0.90
  
  **Steps to Reproduce**

  1. Launch 6 vllm instances sharing a kvcached physical pool
  2. Run a staggered load sweep with long completions (e.g. completion_len=2048, peak_rps=20) across all instances
  3. When multiple instances are simultaneously draining heavy backlogs (high KV cache usage), one instance crashes

  **Logs (see attached):**

  [kvcached][WARNING] kv_cache_manager.py:174 available_size()=71 < need_size=76
  ERROR core.py:938 EngineCore encountered a fatal error.
    ...
    scheduler.schedule()
    → kv_cache_manager.allocate_slots()
    → coordinator.allocate_new_blocks()
    → block_pool.get_new_blocks(num_new_blocks)   ← AssertionError

  **Expected behavior:**

  When alloc() returns None, the engine should handle it gracefully rather than crash.

  Notes:

  The reserved_page_list mechanism partially mitigates this race for pre-mapped pages, but does not cover cases where the
  needed allocation exceeds the reservation buffer.

<img width="1858" height="1450" alt="Image" src="https://github.com/user-attachments/assets/a0ee7e60-d944-491f-9072-3f488349a400" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EngineCore crashes when physical KV pool is exhausted under multi-instance load #262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EngineCore crashes when physical KV pool is exhausted under multi-instance load #262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions