[Bug] may_append allocates new block one token too late, causing KV cache write to unallocated block

 ### Description

  `BlockManager.may_append` allocates a new physical block when `len(seq) % block_size == 1`, but the correct condition should be `==
  0`. This causes the new block to be allocated **after** the first token that needs it has already been scheduled, meaning that token's
  KV cache entry will be written to an unallocated block.

  ### Root Cause

  In `block_manager.py`:

  ```python
  def can_append(self, seq: Sequence) -> bool:
      return len(self.free_block_ids) >= (len(seq) % self.block_size == 1)

  def may_append(self, seq: Sequence):
      if len(seq) % self.block_size == 1:
          seq.block_table.append(self._allocate_block())

  may_append is called during the schedule phase, before the new token is appended to the sequence. At this point, len(seq) reflects the
  number of tokens already stored in KV cache blocks.

  For a concrete example with block_size=256:

  ┌────────────────────────────┬──────────┬───────┬───────────────────────┬───────────────────────────────────┐
  │            Step            │ len(seq) │ % 256 │   Current behavior    │         Correct behavior          │
  ├────────────────────────────┼──────────┼───────┼───────────────────────┼───────────────────────────────────┤
  │ Prefill done (256 tokens)  │ 256      │ 0     │ ❌ No block allocated │ Should allocate — block 0 is full │
  ├────────────────────────────┼──────────┼───────┼───────────────────────┼───────────────────────────────────┤
  │ 1st decode token generated │ 257      │ 1     │ Allocates (too late!) │ Already has block allocated       │
  ├────────────────────────────┼──────────┼───────┼───────────────────────┼───────────────────────────────────┤
  │ ...                        │ ...      │ ...   │ ...                   │ ...                               │
  ├────────────────────────────┼──────────┼───────┼───────────────────────┼───────────────────────────────────┤
  │ Block 1 full (512 tokens)  │ 512      │ 0     │ ❌ No block allocated │ Should allocate — block 1 is full │
  ├────────────────────────────┼──────────┼───────┼───────────────────────┼───────────────────────────────────┤
  │ Next decode token          │ 513      │ 1     │ Allocates (too late!) │ Already has block allocated       │
  └────────────────────────────┴──────────┴───────┴───────────────────────┴───────────────────────────────────┘

  When len(seq) == 256, block 0 is full. The upcoming decode token (the 257th token) needs to be written to block 1, but may_append does
  not allocate it because 256 % 256 == 0 ≠ 1. The allocation only happens on the next schedule cycle, after the token has already been
  scheduled and presumably written to KV cache — at which point block 1 is not yet in the block table.

  Impact

  KV cache writes for the first token of every new block will target an unallocated physical block, which can cause:

  - Silent data corruption in KV cache
  - Memory corruption / out-of-bounds access
  - Incorrect generation results

  Fix

  Change both conditions from == 1 to == 0:

  def can_append(self, seq: Sequence) -> bool:
      return len(self.free_block_ids) >= (len(seq) % self.block_size == 0)

  def may_append(self, seq: Sequence):
      if len(seq) % self.block_size == 0:
          seq.block_table.append(self._allocate_block())

  can_append must also be fixed because it acts as a guard for may_append — if they disagree on whether a new block is needed, the
  scheduler could skip preemption and crash with an IndexError when may_append tries to pop from an empty free_block_ids.
  ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] may_append allocates new block one token too late, causing KV cache write to unallocated block #240

Description

Root Cause

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] may_append allocates new block one token too late, causing KV cache write to unallocated block #240

Description

Description

Root Cause

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions