Skip to content

Conversation

richardhuo-nv
Copy link
Contributor

@richardhuo-nv richardhuo-nv commented Oct 6, 2025

Overview:

KVBM need a standalone metrics port and registry to achieve the modularity. So that user could bypass the distributed runtime requirements for getting the metrics.

Also, removed the kvbm worker metrics, since it will cause port conflict issue when we have multiple TPs.

Details:

  1. Have a env var DYN_KVBM_METRICS to control if setting up an standalone metrics port.
  2. remove the worker metrics for now since that will cause port conflict issue no matter it's distributed runtime or standalone. In the future, we should look into aggregating the worker metrics from all workers so that a user could get metrics from one single port.
(venv) root@ipp2-1661:/workspace# curl http://localhost:6881/metrics
# HELP kvbm_matched_tokens The number of matched tokens
# TYPE kvbm_matched_tokens counter
kvbm_matched_tokens 0
# HELP kvbm_offload_blocks_d2h The number of offload blocks from device to host
# TYPE kvbm_offload_blocks_d2h counter
kvbm_offload_blocks_d2h 6
# HELP kvbm_offload_requests The number of offload requests
# TYPE kvbm_offload_requests counter
kvbm_offload_requests 5
# HELP kvbm_onboard_blocks_d2d The number of onboard blocks from disk to device
# TYPE kvbm_onboard_blocks_d2d counter
kvbm_onboard_blocks_d2d 0
# HELP kvbm_onboard_blocks_h2d The number of onboard blocks from host to device
# TYPE kvbm_onboard_blocks_h2d counter
kvbm_onboard_blocks_h2d 0
# HELP kvbm_onboard_requests The number of onboard requests
# TYPE kvbm_onboard_requests counter
kvbm_onboard_requests 0
image

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Added standalone KVBM metrics mode with its own HTTP endpoint; enable via env flag and optional port (default 6881).
  • Documentation
    • Updated vLLM and TRT-LLM guides with standalone metrics instructions and examples; removed references to port 6882 and firewall steps.
  • Chores
    • Simplified Prometheus config to a single KVBM metrics target; removed the separate leader scrape job.
  • Refactor
    • Updated Grafana dashboard to include kvbm_* series and removed obsolete panels for cleaner visualization.

@richardhuo-nv richardhuo-nv requested review from a team as code owners October 6, 2025 16:38
Copy link

copy-pr-bot bot commented Oct 6, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

coderabbitai bot commented Oct 6, 2025

Walkthrough

Introduces standalone KVBM metrics mode served on a single port (default 6881) gated by environment variables. Updates Rust leaders/recorders and Python connectors to branch on this mode, removes worker-side metric instrumentation, adjusts Grafana queries and panels, drops Prometheus leader scrape, and revises docs accordingly.

Changes

Cohort / File(s) Summary
Core metrics (standalone server + registry)
lib/llm/src/block_manager/metrics_kvbm.rs
Adds KvbmMetricsRegistry and KvbmMetrics::new_with_standalone launching an HTTP /metrics server; removes save_kv_layer_requests; adds graceful shutdown; exposes labeled counter creation; implements Drop.
Rust leaders: vLLM + TRT-LLM (standalone toggle)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs
Extend constructors with metrics_standalone flag; parse DYN_KVBM_METRICS_STANDALONE and DYN_KVBM_METRICS_PORT; initialize metrics via KvbmMetricsRegistry when standalone, otherwise namespace-based.
Rust workers (remove KvbmMetrics usage)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs
Delete kvbm_metrics field, initialization, and save_kv_layer_requests increment; remove related imports.
Python utils and connectors (env-gated port setup)
lib/bindings/python/src/dynamo/llm/utils.py, .../trtllm_integration/connector/kvbm_connector_leader.py, .../trtllm_integration/connector/kvbm_connector_worker.py, .../vllm_integration/connector_leader.py, .../vllm_integration/connector_worker.py
Add is_standslone_kvbm_metrics_enabled(); remove maybe_sleep(); guard find_and_set_available_port_from_env calls based on standalone flag; remove worker-side port setup.
Dashboards (Grafana)
deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json
Switch targets to editorMode "code"; extend PromQL with kvbm_* ORs; remove "Save KV Layer Requests" panel and related blocks; structural cleanup.
Prometheus config
deploy/metrics/prometheus.yml
Remove kvbm-leader-metrics job (port 6882); keep kvbm-worker-metrics (port 6881).
Docs
docs/guides/run_kvbm_in_trtllm.md, docs/guides/run_kvbm_in_vllm.md
Update instructions to single-port metrics; document DYN_KVBM_METRICS_STANDALONE and optional DYN_KVBM_METRICS_PORT; add standalone examples; remove 6882 references and ufw notes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User as User/Operator
  participant Env as Env Vars
  participant Py as Python Connector (Leader)
  participant Rust as Rust Leader
  participant Reg as KvbmMetricsRegistry
  participant Srv as Metrics HTTP Server
  participant Prom as Prometheus

  User->>Env: Set DYN_KVBM_METRICS_STANDALONE [=1] and optional DYN_KVBM_METRICS_PORT
  Py->>Env: Read standalone flag
  alt Standalone enabled
    Py->>Rust: KvConnectorLeader::new(..., metrics_standalone=true)
    Rust->>Env: parse_dyn_kvbm_metrics_port() (default 6881)
    Rust->>Reg: Init registry and counters (kvbm_* prefixed)
    Reg-->>Rust: Registry handle
    Rust->>Srv: Spawn /metrics on selected port (background)
    Prom-->>Srv: Scrape /metrics
    Srv-->>Prom: Expose kvbm_* metrics
  else Namespace mode
    Py->>Rust: KvConnectorLeader::new(..., metrics_standalone=false)
    Rust->>Rust: Use namespace-based metrics
  end

  Note over Rust,Srv: Drop KvbmMetrics triggers graceful shutdown of server
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A hop, a skip, a port made one,
I twirl my ears—metrics spun! 🥕
Standalone streams, no leader twin,
Prometheus sniffs, Grafana grins.
Counters chirp where rabbits run—
kvbm_* beneath the sun.
Thump-thump: review’s done!

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.37% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ⚠️ Warning The pull request description includes the required template headings but leaves the “Where should the reviewer start?” section as an unfilled placeholder, so a key part of the template is incomplete. Please populate the “Where should the reviewer start?” section with specific files or modules that warrant focused review, ensuring each template section contains substantive content.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely captures the main feature of adding a standalone KVBM metrics endpoint for improved modularity and references the issue identifier without extraneous details or noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/bindings/python/src/dynamo/llm/utils.py (1)

36-44: Rename helper to fix typo

Line 36 spells the helper is_standslone_kvbm_metrics_enabled, which is easy to misread and will propagate the typo across future call sites. Rename it (and the imports) to is_standalone_kvbm_metrics_enabled while the surface area is still tiny.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de6fdf0 and 30d911d.

📒 Files selected for processing (15)
  • deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (7 hunks)
  • deploy/metrics/prometheus.yml (0 hunks)
  • docs/guides/run_kvbm_in_trtllm.md (2 hunks)
  • docs/guides/run_kvbm_in_vllm.md (1 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (5 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (2 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs (4 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (0 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (0 hunks)
  • lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2 hunks)
  • lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (0 hunks)
  • lib/bindings/python/src/dynamo/llm/utils.py (1 hunks)
  • lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (2 hunks)
  • lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py (0 hunks)
  • lib/llm/src/block_manager/metrics_kvbm.rs (3 hunks)
💤 Files with no reviewable changes (5)
  • lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs
  • lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py
  • deploy/metrics/prometheus.yml
🧰 Additional context used
🧬 Code graph analysis (6)
lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (1)
lib/bindings/python/src/dynamo/llm/utils.py (2)
  • find_and_set_available_port_from_env (9-31)
  • is_standslone_kvbm_metrics_enabled (36-44)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (3)
  • parse_dyn_kvbm_metrics_port (637-651)
  • new (91-184)
  • new (559-590)
lib/llm/src/block_manager/metrics_kvbm.rs (3)
  • new_with_standalone (85-163)
  • default (241-243)
  • new (41-81)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
  • new (88-206)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
  • new (190-227)
  • new (340-371)
  • new (1085-1098)
  • new (1109-1122)
  • new (1132-1142)
  • new (1458-1463)
  • new (1467-1472)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (3)
lib/llm/src/block_manager/metrics_kvbm.rs (4)
  • new_with_standalone (85-163)
  • default (241-243)
  • new (41-81)
  • new (211-216)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
  • new (88-206)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs (2)
  • new (65-148)
  • new (455-473)
lib/llm/src/block_manager/metrics_kvbm.rs (3)
lib/runtime/src/metrics/prometheus_names.rs (1)
  • sanitize_prometheus_name (381-407)
lib/runtime/src/pipeline/context.rs (1)
  • registry (237-239)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
  • new (91-184)
  • new (559-590)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (2)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (3)
  • parse_dyn_kvbm_metrics_port (637-651)
  • new (91-184)
  • new (559-590)
lib/llm/src/block_manager/metrics_kvbm.rs (4)
  • new_with_standalone (85-163)
  • default (241-243)
  • new (41-81)
  • new (211-216)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (1)
lib/bindings/python/src/dynamo/llm/utils.py (2)
  • find_and_set_available_port_from_env (9-31)
  • is_standslone_kvbm_metrics_enabled (36-44)
🪛 markdownlint-cli2 (0.18.1)
docs/guides/run_kvbm_in_vllm.md

91-91: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: clippy (.)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (launch/dynamo-run)

@richardhuo-nv richardhuo-nv changed the title chore: DIS-678 kvbm modularity: standalone metrics endpont chore: DIS-678 kvbm modularity: standalone metrics endpoint Oct 6, 2025
Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the metric names, can you use the common prometheus_names.rs?

@rmccorm4
Copy link
Contributor

rmccorm4 commented Oct 6, 2025

/ok to test 30d911d

@pull-request-size pull-request-size bot added size/XL and removed size/L labels Oct 6, 2025
@richardhuo-nv
Copy link
Contributor Author

/ok to test 4b9b9dd

@richardhuo-nv
Copy link
Contributor Author

For the metric names, can you use the common prometheus_names.rs?

sure, added

@pull-request-size pull-request-size bot added size/L and removed size/XL labels Oct 6, 2025
@richardhuo-nv
Copy link
Contributor Author

/ok to test 3416649

@richardhuo-nv richardhuo-nv changed the title chore: DIS-678 kvbm modularity: standalone metrics endpoint feat: DIS-678 kvbm modularity: standalone metrics endpoint Oct 7, 2025
@github-actions github-actions bot added feat and removed chore labels Oct 7, 2025
Signed-off-by: richardhuo-nv <[email protected]>

metrics implementation

Signed-off-by: richardhuo-nv <[email protected]>

add env for metrics

Signed-off-by: richardhuo-nv <[email protected]>

add port handling

Signed-off-by: richardhuo-nv <[email protected]>

add port handling

Signed-off-by: richardhuo-nv <[email protected]>

fix

fix

fix

fix

fix

Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>

fix port

Signed-off-by: richardhuo-nv <[email protected]>

fix docs

Signed-off-by: richardhuo-nv <[email protected]>

fix port

Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
@richardhuo-nv
Copy link
Contributor Author

/ok to test e6607d8

Signed-off-by: richardhuo-nv <[email protected]>
@richardhuo-nv
Copy link
Contributor Author

/ok to test 765c697

Signed-off-by: richardhuo-nv <[email protected]>
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Oct 7, 2025
Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from deploy prespective

@richardhuo-nv richardhuo-nv merged commit cf83794 into main Oct 8, 2025
29 checks passed
@richardhuo-nv richardhuo-nv deleted the rihuo/prom_kvbm branch October 8, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants