Skip to content

Performance Optimization: Page Allocator Migrates from Python to C++#319

Open
lianghao208 wants to merge 2 commits into
ovg-project:mainfrom
lianghao208:lianghao_c++
Open

Performance Optimization: Page Allocator Migrates from Python to C++#319
lianghao208 wants to merge 2 commits into
ovg-project:mainfrom
lianghao208:lianghao_c++

Conversation

@lianghao208
Copy link
Copy Markdown
Contributor

issue #299

@lianghao208 lianghao208 force-pushed the lianghao_c++ branch 4 times, most recently from 65f0f81 to ae4eb83 Compare April 30, 2026 06:05
@cui36
Copy link
Copy Markdown
Collaborator

cui36 commented Apr 30, 2026

Thanks @lianghao208!

@cui36
Copy link
Copy Markdown
Collaborator

cui36 commented May 3, 2026

Hi @lianghao208,

Thanks for the C++ migration. After running it, I want to propose two must-fixes and a design decision.

Must-fix (pushed a new commit)

1. torch_bindings.cpp is missing #include <pybind11/functional.h> (and <pybind11/stl.h> for the new vector / unordered_map returns). Without it, set_should_use_worker_ipc_callback rejects any Python callable, and KVCacheManager.__init__ always raises TypeError at kv_cache_manager.py:111. All 5 tests in test_kvcache_manager.py regress from PASSED on main to ERROR on this PR.

2. The test fixture in tests/test_kvcache_manager.py accesses Python-only attributes that don't exist on the C++ allocator: enable_page_prealloc, min_reserved_pages, reserved_page_list, num_total_pages. Replace with get_num_reserved_pages() / get_num_total_pages().

Design decision: still support kvctl limit? (proposed in a new PR: #323)

The remaining issues all stem from one question: whether the engine should still react to external writers of the shm total_size:

  1. kv_cache_manager.py:230-234 comments out the resize poll.
  2. C++ derives the IPC name as kvcached_engine_<pgid> while Python / kvctl use kvcached_<engine_tag>_<pgid>, the two sides aren't even looking at the same /dev/shm segment, so even if fix [sglang-integration]: kvcached's free() should be called outside free_group #1 were re-enabled, no signal would cross.

If kvctl stays a published feature (it has docs in examples/02_memory_control/, a console-script entry in setup.py, and a working test on main), all three need fixing. Cost: ~7 μs/alloc for the shm poll, but the microbench still retains ~83% of the C++ migration's speedup over main (alloc+free: main 65 μs → PR #319 23 μs → with fixes 30 μs). More optimization details in PR #323.

A future optimization that keeps kvctl working and recovers the ~7 μs would be a watcher thread that polls shm at low frequency and exposes a single atomic flag to the alloc hot path.

Note that no matter adding kvctl or not, passive elasticity (unmap-on-free, physical GPU memory floating between co-located instances via CUDA VMM) is unaffected either way. Only the active control plane is at stake here.

@cui36 cui36 mentioned this pull request May 3, 2026
@lianghao208
Copy link
Copy Markdown
Contributor Author

@cui36 Thanks for the review, they all make sense to me. I notice the PR #323 has already fixed all the correspondence problems.

Comment thread csrc/page_allocator.cpp
Comment thread csrc/page_allocator.cpp Outdated
Comment thread kvcached/kv_cache_manager.py Outdated
Comment thread csrc/page_allocator.cpp Outdated
cui36 added a commit that referenced this pull request May 11, 2026
Apply diff from 98d9bb3 -> 65a7d0a (lianghao208/kvcached:lianghao_c++):

- csrc/page_allocator.cpp: refactor free_page/free_pages/resize/trim
  to use scoped lock_guard blocks instead of manual lock/unlock,
  making the slow-path unmap exception-safe. Also drop the
  max_reserved_pages_ auto-expansion in alloc_page().

- kvcached/kv_cache_manager.py: cap available_size() by physical
  free pages (avail_physical + reserved) in addition to virtual
  free pages, so capacity reported under memory pressure stays honest.

Functionally equivalent to PR #319 head; the local override commits
(restore-resize, bench scripts, overhead notes) sit on top.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants