Skip to content

Conversation

richardhuo-nv
Copy link
Contributor

@richardhuo-nv richardhuo-nv commented Aug 19, 2025

Overview:

Dynamo KVBM connector API integration with TRTLLM, including fixes after merging the connector API vLLM integration and addressing comments from #2440

Details:

The PR is based on the ongoing changes from NVIDIA/TensorRT-LLM#7228, which added TRT-LLM connector API compatibility.

Changes in this PR:

  1. Added support for an external KV cache layout that is fully contiguous, since TRT-LLM’s cache layout is fully contiguous.
  2. Instead of providing bytes_per_block from the KVBM leader bindings, the KVBM leader now reads bytes_per_block from the worker for a more accurate estimation. This is necessary because there is no straightforward or reliable way to extract bytes_per_block directly from the TRT-LLM KV cache.
  3. Introduced additional complexity to the leader–worker barrier, requiring two synchronization steps:
    Worker → Leader: send bytes_per_block.
    Leader → Worker: send num_host_blocks and num_disk_blocks.
  4. Added the basic Rust-based leader–worker integration for TRT-LLM, with some compatibility changes:
    1). When the leader calls update_state_after_alloc, no num_external_tokens is passed. For now, a HashMap is used to track each request’s num_external_tokens when calling get_num_new_matched_tokens.
    2). build_connector_metadata now uses a new implementation of apply_scheduler_output, namely apply_scheduler_output_with_computed_position, to trigger offloading. This change is required because TRT-LLM’s scheduler output does not include the number of scheduled tokens, so offloading decisions must be based on the computed position.

Issues:
A separate Python module and Rust crate will need to be built specifically for TRT-LLM. For now, TRT-LLM is included in vllm_integration to reuse some core KVBM integration code.

New changes after last review:

  1. Updated the vLLM integration to read the "bytes_per_block" from the worker, instead of getting from the scheduler.
  2. KVBM worker can either initialize in a blocking or non-blocking mode.
    in vLLM, the engine core will register kv cache in workers first and then initialize the leader, the engine's readiness depends on the scheduler's readiness, so the worker initialization need to be non-blocking, and the scheduler's initialization need to be blocking.
    in TRTLLM, the engine core will initialize the leader and register kv cache in workers at the same time and in the same process, he engine's readiness depends on the worker's readiness, so the worker initialization need to be blocking, and the scheduler's initialization need to be non-blocking.

Where should the reviewer start?

The reviewer could start with kvbm_connector_leader.py and kvbm_connector_worker.py. And then look into the rust bindings inside the trtllm_leader.rs and trtllm_worker.rs implementation.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Copy link

copy-pr-bot bot commented Aug 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Aug 19, 2025
@richardhuo-nv richardhuo-nv force-pushed the rihuo/connector-api-trtllm branch 2 times, most recently from fc0fcb0 to 8eeee78 Compare August 20, 2025 00:21
@richardhuo-nv richardhuo-nv force-pushed the rihuo/connector-api-trtllm branch 8 times, most recently from 6f41dd4 to 29e9507 Compare August 22, 2025 10:08
@richardhuo-nv richardhuo-nv force-pushed the rihuo/connector-api-trtllm branch 8 times, most recently from e16d1ec to 85319a1 Compare August 29, 2025 00:30
@richardhuo-nv richardhuo-nv force-pushed the rihuo/connector-api-trtllm branch from 4b44b5c to 9038bfb Compare August 29, 2025 03:51
fix

fix

fix

fix interace

fix interace

fix

add logs

async leader

fix

fix

fix

fix

fix scheduled tokens

fix

fix

fix

fix

add logs

add logs

fix

fix and log

fix and log

fix

fix

fix layout

fix

fix

fix

fix

fmt

fix

fix

comments

fmt

fix comment

Signed-off-by: richardhuo-nv <[email protected]>
fix

fix position

fix

fix

fix

fix

fix

fix

Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
integrate vllm

add doc

fix blocking

fix

fix

fix

fix

fix

fix

fmt

fix

Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
@richardhuo-nv richardhuo-nv force-pushed the rihuo/connector-api-trtllm branch from f9c0ea1 to 8c7dd17 Compare August 29, 2025 04:31
Signed-off-by: richardhuo-nv <[email protected]>
Copy link
Contributor

@oandreeva-nv oandreeva-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

@oandreeva-nv, ⭐3 XP earned, 🏆First Code Review Done completed, 💪Level 5 achieved! [Details]

icon

Signed-off-by: richardhuo-nv <[email protected]>
Copy link
Contributor

@ziqifan617 ziqifan617 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: richardhuo-nv <[email protected]>
@richardhuo-nv richardhuo-nv merged commit a68c2f8 into main Aug 30, 2025
13 checks passed
@richardhuo-nv richardhuo-nv deleted the rihuo/connector-api-trtllm branch August 30, 2025 02:27
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants