Roadmap/Feature requested/TODOs---Start Here for New Contributors

# kvcached Roadmap

Active workstreams, roughly in priority order. Each item links to the relevant issues/PRs; pick one up or file a new issue if something you care about isn't listed.

## Production stability

- [ ] **Fix KV pool exhaustion crash under multi-instance load** — [#262](https://github.com/ovg-project/kvcached/issues/262)
  Central kvcached use case; current behavior is a production stability blocker.
  - [ ] Review + merge [PR #301](https://github.com/ovg-project/kvcached/pull/301)

## Serving stack compatibility

- [ ] **PD (prefill / decode) disaggregation support** — [#302](https://github.com/ovg-project/kvcached/issues/302), [#311](https://github.com/ovg-project/kvcached/issues/311)
  Without this, kvcached is locked out of PD-based serving stacks. vLLM exposes PD via several `KVConnector` implementations, each with its own assumptions about KV layout and block count; we have to validate kvcached against each pattern.
  - [x] **NixlConnector** (UCX / RDMA): [PR #313](https://github.com/ovg-project/kvcached/pull/313) patches the two known incompatibilities (HND `set_stride` crash + `num_blocks` assertion). Pending review + merge; UCX-over-VMM transfer path still needs an end-to-end correctness run.
  - [ ] **P2pNcclConnector** (NCCL P2P): untested with kvcached. No block-count assertion in the connector itself, but NCCL-on-VMM compatibility unverified. Need a smoke test + fix list.
  - [ ] **LMCacheConnectorV1**: untested; LMCache pulls full tensors so layout/over-provision behavior may differ. Need a smoke test.
  - [ ] **MooncakeConnector / SharedStorageConnector / MultiConnector**: out of scope for now; revisit once the three above are green.
  - [ ] Add an example demonstrating PD + kvcached (per-connector minimal repro under `examples/`)
  - [ ] Clarify the support matrix on [#302](https://github.com/ovg-project/kvcached/issues/302) once each connector lands

- [ ] **KV cache CPU offloading in vLLM** — [PR #269](https://github.com/ovg-project/kvcached/pull/269)
  Major capacity multiplier for memory-bound deployments.
  - [ ] Review + perf sanity check on [PR #269](https://github.com/ovg-project/kvcached/pull/269)
  - [ ] Fix vllm-0.16.0 + `--kv-offloading-size` error ([#267](https://github.com/ovg-project/kvcached/issues/267))

- [ ] **Cross-version vLLM compatibility** — ongoing
  vLLM ships every few weeks; each break delays user upgrades. Most recent fix: [PR #305](https://github.com/ovg-project/kvcached/pull/305) (block-pool integration).

- [ ] **Maintain a patched vLLM fork in the org repo**
  Carry all kvcached patches in a single org-owned vLLM branch, kept rebased against upstream. Goal: one-line install that gives users vLLM + every kvcached integration patch in sync, instead of asking them to apply patches by hand against whatever vLLM tag they happen to be on.
  - [ ] Decide branch / tag strategy (per-vLLM-minor branch vs rolling)
  - [ ] Set up CI that re-runs kvcached integration tests on each upstream sync
  - [ ] Document the install path in the README

## Performance

- [ ] **Layout overhead study: contiguous vs non-contiguous** — see `layout-comparison.md`
  Recent e2e numbers on attention-only show non-contig ≈ vanilla vLLM while contig (today's default) carries the 30–50% overhead also seen in [#299](https://github.com/ovg-project/kvcached/issues/299) / PR [#319](https://github.com/ovg-project/kvcached/pull/319). Goal: confirm + generalise so we can flip the default.
  - [ ] (1) Pin attention-only: clean A/B/C (vanilla / contig=true / contig=false), Qwen3-0.6B, `bench serve`, 3 seeds
  - [ ] (2) Generalise to hybrid: same harness on smallest Jamba that fits, contig=false vs vanilla
  - [ ] (3) Decide: if both hold → flip default to `contig=false`, drop HYBRID_LINEAR `ValueError`, deprecate the env var

- [ ] **Detailed performance characterisation across configs**
  Per-config sweep so users can predict overhead before deploying, and so we have a stable reference for regression tracking. Axes: model size, num_layers, page_size, num_kv_buffers, batch / request rate, layout flag, single-instance vs multi-instance. Output: a reproducible harness + published table / dashboard.

## Packaging / maintenance

- [ ] **Port C++ extension to libtorch stable ABI** — [#306](https://github.com/ovg-project/kvcached/issues/306), [PR #308](https://github.com/ovg-project/kvcached/pull/308)
  Removes the per-torch-version rebuild; long-term maintenance + redistribution win.

## Hardware backend expansion

All tracks exploratory / PoC stage.

- [ ] **AMD support** — [PR #248](https://github.com/ovg-project/kvcached/pull/248); branches `amd-support-init`, `amd-benchmark`
- [ ] **MLX support (Apple Silicon)** — draft [PR #298](https://github.com/ovg-project/kvcached/pull/298)
- [ ] **Arm64 support** — [#225](https://github.com/ovg-project/kvcached/issues/225)

## Docs / examples

- [ ] **Project website** — landing page + docs (scope TBD)
  Currently only a README; a proper site would help adoption and give a stable link for talks and papers.

- [ ] **MuxServe++ reproduction / model support** — [#231](https://github.com/ovg-project/kvcached/issues/231)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap/Feature requested/TODOs---Start Here for New Contributors #273

kvcached Roadmap

Production stability

Serving stack compatibility

Performance

Packaging / maintenance

Hardware backend expansion

Docs / examples

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roadmap/Feature requested/TODOs---Start Here for New Contributors #273

Description

kvcached Roadmap

Production stability

Serving stack compatibility

Performance

Packaging / maintenance

Hardware backend expansion

Docs / examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions