feat: Add int8 KV cache compression with head-major layout and async pipelining by naalo2 · Pull Request #229 · GeeeekExplorer/nano-vllm

naalo2 · 2026-05-08T07:47:29Z

Closes #228

Summary

Add int8 KV cache compression with head-major memory layout and async stream pipelining to hide KV store latency behind attention computation.

Key Changes

config.py: Add quant flag and group_num configuration fields
context.py / sequence.py: Update parameter passing interfaces and methods
attention.py: Add quant computation branch and quant_store operator
model_runner.py: Add quant KV cache initialization branch and CUDA graph capture under quant path
tools/quant_attn_kvhead_based.py: Implement concrete int8 quantization compute operators

Benefits

Reduced KV cache memory footprint via int8 quantization
Improved GPU memory access efficiency via head-major coalesced layout
Better throughput via async pipelining overlapping KV transfers with computation

naalo2 added 11 commits May 8, 2026 15:17

change model_runner

e8254ba

change attention

859c9e4

add tools

9641e4e

change sequence

2853e94

change config

68ec051

change context

6f69e38

change context

a8b0555

my logo,readme,etc

d28caf8

new logo

0a6e35b

Delete logo.png

e610fa7

new logo

ad2fd49

naalo2 mentioned this pull request May 8, 2026

[change] Int8 KV Cache + Async Pipeline + Head-major reordering for 22% Throughput Boost #228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add int8 KV cache compression with head-major layout and async pipelining#229

feat: Add int8 KV cache compression with head-major layout and async pipelining#229
naalo2 wants to merge 11 commits into
GeeeekExplorer:mainfrom
naalo2:main

naalo2 commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

naalo2 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Benefits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

naalo2 commented May 8, 2026 •

edited

Loading