Performance Optimization: Page Allocator Migrates from Python to C++ by lianghao208 · Pull Request #319 · ovg-project/kvcached

lianghao208 · 2026-04-30T04:16:33Z

issue #299

cui36 · 2026-04-30T15:09:30Z

cui36 · 2026-05-03T16:16:47Z

Thanks for the C++ migration. After running it, I want to propose two must-fixes and a design decision.

Must-fix (pushed a new commit)

1. torch_bindings.cpp is missing #include <pybind11/functional.h> (and <pybind11/stl.h> for the new vector / unordered_map returns). Without it, set_should_use_worker_ipc_callback rejects any Python callable, and KVCacheManager.__init__ always raises TypeError at kv_cache_manager.py:111. All 5 tests in test_kvcache_manager.py regress from PASSED on main to ERROR on this PR.

2. The test fixture in tests/test_kvcache_manager.py accesses Python-only attributes that don't exist on the C++ allocator: enable_page_prealloc, min_reserved_pages, reserved_page_list, num_total_pages. Replace with get_num_reserved_pages() / get_num_total_pages().

Design decision: still support `kvctl limit`? (proposed in a new PR: #323)

The remaining issues all stem from one question: whether the engine should still react to external writers of the shm total_size:

kv_cache_manager.py:230-234 comments out the resize poll.
C++ derives the IPC name as kvcached_engine_<pgid> while Python / kvctl use kvcached_<engine_tag>_<pgid>, the two sides aren't even looking at the same /dev/shm segment, so even if fix [sglang-integration]: kvcached's free() should be called outside free_group #1 were re-enabled, no signal would cross.

If kvctl stays a published feature (it has docs in examples/02_memory_control/, a console-script entry in setup.py, and a working test on main), all three need fixing. Cost: ~7 μs/alloc for the shm poll, but the microbench still retains ~83% of the C++ migration's speedup over main (alloc+free: main 65 μs → PR #319 23 μs → with fixes 30 μs). More optimization details in PR #323.

A future optimization that keeps kvctl working and recovers the ~7 μs would be a watcher thread that polls shm at low frequency and exposes a single atomic flag to the alloc hot path.

Note that no matter adding kvctl or not, passive elasticity (unmap-on-free, physical GPU memory floating between co-located instances via CUDA VMM) is unaffected either way. Only the active control plane is at stake here.

lianghao208 · 2026-05-04T12:33:30Z

@cui36 Thanks for the review, they all make sense to me. I notice the PR #323 has already fixed all the correspondence problems.

Apply diff from 98d9bb3 -> 65a7d0a (lianghao208/kvcached:lianghao_c++): - csrc/page_allocator.cpp: refactor free_page/free_pages/resize/trim to use scoped lock_guard blocks instead of manual lock/unlock, making the slow-path unmap exception-safe. Also drop the max_reserved_pages_ auto-expansion in alloc_page(). - kvcached/kv_cache_manager.py: cap available_size() by physical free pages (avail_physical + reserved) in addition to virtual free pages, so capacity reported under memory pressure stays honest. Functionally equivalent to PR #319 head; the local override commits (restore-resize, bench scripts, overhead notes) sit on top.

lianghao208 mentioned this pull request Apr 30, 2026

Performance Optimization Track Issue #299

Open

4 tasks

lianghao208 force-pushed the lianghao_c++ branch 4 times, most recently from 65f0f81 to ae4eb83 Compare April 30, 2026 06:05

cui36 mentioned this pull request May 3, 2026

Fix/pr319 restore resize #323

Open

qinganrice reviewed May 5, 2026

View reviewed changes

Comment thread csrc/page_allocator.cpp

qinganrice reviewed May 5, 2026

View reviewed changes

Comment thread csrc/page_allocator.cpp Outdated

qinganrice reviewed May 5, 2026

View reviewed changes

Comment thread kvcached/kv_cache_manager.py Outdated

qinganrice reviewed May 5, 2026

View reviewed changes

Comment thread csrc/page_allocator.cpp Outdated

lianghao208 force-pushed the lianghao_c++ branch from 98d9bb3 to 38b7149 Compare May 5, 2026 02:54

lianghao208 and others added 2 commits May 5, 2026 11:07

page allocator migrates from python to C++

31d54df

fix include error and modify test file

65a7d0a

lianghao208 force-pushed the lianghao_c++ branch from 38b7149 to 65a7d0a Compare May 5, 2026 03:07

cui36 mentioned this pull request May 5, 2026

Roadmap/Feature requested/TODOs---Start Here for New Contributors #273

Open

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Optimization: Page Allocator Migrates from Python to C++#319

Performance Optimization: Page Allocator Migrates from Python to C++#319
lianghao208 wants to merge 2 commits into
ovg-project:mainfrom
lianghao208:lianghao_c++

lianghao208 commented Apr 30, 2026

Uh oh!

cui36 commented Apr 30, 2026

Uh oh!

cui36 commented May 3, 2026 •

edited

Loading

Uh oh!

lianghao208 commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lianghao208 commented Apr 30, 2026

Uh oh!

cui36 commented Apr 30, 2026

Uh oh!

cui36 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Must-fix (pushed a new commit)

Design decision: still support kvctl limit? (proposed in a new PR: #323)

Uh oh!

lianghao208 commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cui36 commented May 3, 2026 •

edited

Loading

Design decision: still support `kvctl limit`? (proposed in a new PR: #323)