Skip to content

Conversation

tzulingk
Copy link
Contributor

@tzulingk tzulingk commented Aug 16, 2025

Overview:

Implement add_labels for Endpoint and add model name to StatLoggerFactory and DynamoStatLoggerPublisher.

Details:

Simply adding the model labels to the Endpoint doesn’t make the metrics include the model label.
This happens because the code registers the same metric multiple times. The first registration doesn’t have the model label, and by the time we add the model label to the Endpoint through add_labels, it’s already too late.

The first time is added through

/// components/backends/vllm/src/dynamo/vllm/main.py init()
factory = StatLoggerFactory(component, config.engine_args.data_parallel_rank or 0)

The second time is added through

generate_endpoint.serve_endpoint(handler.generate, graceful_shutdown=False),
clear_endpoint.serve_endpoint(handler.clear_kv_blocks),

So we need to pass the model name into StatLoggerFactory and DynamoStatLoggerPublisher

Where should the reviewer start?

components/backends/vllm/src/dynamo/vllm/publisher.py: Add model name to DynamoStatLoggerPublisher and StatLoggerFactory
lib/runtime/src/component.rs: Implement add_labels for Endpoint.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-444 The model label is missing from the backend metrics

Example

# Start frontend
python -m dynamo.frontend &

# Enable per-component system/metrics http servers
export DYN_SYSTEM_ENABLED=true

# Start vllm worker 1, with metrics enabled, and unique port for metrics server
DYN_SYSTEM_PORT=9090 CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager --connector none &

# Start vllm worker 2, with metrics enabled, and unique port for metrics server
DYN_SYSTEM_PORT=9091 CUDA_VISIBLE_DEVICES=1 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager --connector none &

# Check metrics outputs for each worker
curl -s localhost:9090/metrics | grep dynamo_component_requests_total | grep generate | grep model
# dynamo_component_requests_total{dynamo_component="backend",dynamo_endpoint="generate",dynamo_namespace="dynamo",model="Qwen/Qwen3-0.6B"} 0

curl -s localhost:9091/metrics | grep dynamo_component_requests_total | grep generate | grep model
# dynamo_component_requests_total{dynamo_component="backend",dynamo_endpoint="generate",dynamo_namespace="dynamo",model="Qwen/Qwen3-0.6B"} 0

# Send inference request
curl -X POST http://localhost:8080/v1/chat/completions   -H 'Content-Type: application/
json'   -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      { "role": "user", "content": "Tell me a story about a brave cat" }
    ],
    "stream": false,
    "max_tokens": 1028
  }'

# Check metrics outputs to see that one worker got updated, and the other did not
curl -s localhost:9090/metrics | grep dynamo_component_requests_total | grep generate | grep model
# dynamo_component_requests_total{dynamo_component="backend",dynamo_endpoint="generate",dynamo_namespace="dynamo",model="Qwen/Qwen3-0.6B"} 0
(venv) root@4u4g-0073:/workspace# curl -s localhost:9091/metrics | grep dynamo_component_requests_total | grep generate | grep model
# dynamo_component_requests_total{dynamo_component="backend",dynamo_endpoint="generate",dynamo_namespace="dynamo",model="Qwen/Qwen3-0.6B"} 1

# Success: one of the workers received the request and incremented the metric
# With a 2nd request (assuming round robin routing), you will see the other worker's metric increment this time.  

Summary by CodeRabbit

  • New Features

    • Endpoints can now be labeled (e.g., model) for per-model routing and metrics across runtime, metrics, and vLLM.
    • Python: added Endpoint.add_labels(labels) to attach constant labels.
    • Example clients/servers updated to annotate “generate” endpoints with model labels.
  • Tests

    • Added unit tests verifying label injection and propagation into Prometheus metrics.

Copy link
Contributor

coderabbitai bot commented Aug 16, 2025

Walkthrough

Introduces endpoint label support (notably a “model” label) across runtime, bindings, and examples. Extends WorkerMetricsPublisher::create_endpoint to accept an optional model. Adds Endpoint.add_labels and stored_labels APIs, propagates labels into metrics, and updates VLLM and metrics components to pass model labels during initialization.

Changes

Cohort / File(s) Summary of changes
VLLM integration
components/backends/vllm/src/dynamo/vllm/main.py, components/backends/vllm/src/dynamo/vllm/publisher.py
Pass model label to generate/clear_kv_blocks endpoints via add_labels; StatLoggerFactory and DynamoStatLoggerPublisher accept/propagate optional model to WorkerMetricsPublisher.create_endpoint.
LLM KV publisher + mocker
lib/llm/src/kv_router/publisher.rs, lib/llm/src/mocker/engine.rs
WorkerMetricsPublisher::create_endpoint signature extended with Option model; if provided, adds ("model", ) label to endpoint; call sites updated (mocker passes Some("mock_model")).
Python bindings
lib/bindings/python/rust/llm/kv.rs, lib/bindings/python/rust/lib.rs
Expose WorkerMetricsPublisher.create_endpoint(component, model=None) to Python; add Endpoint.add_labels(labels) to Python API, mapping errors to PyErr and returning a new Endpoint.
Runtime Endpoint + metrics plumbing
lib/runtime/src/component.rs, lib/runtime/src/pipeline/network/ingress/push_handler.rs, lib/runtime/src/metrics.rs
New Endpoint APIs: stored_labels() and add_labels(&[(&str,&str)]) with validation; ingress metrics now include endpoint-provided labels; add unit tests validating label propagation into Prometheus (duplicate test present).
Metrics component
components/metrics/src/main.rs
Conditionally adds ("model", model_name) to target endpoint; service path/subject derive from possibly-labeled endpoint; errors from add_labels propagate on init.
Examples and demos
lib/runtime/examples/hello_world/..., lib/runtime/examples/service_metrics/..., lib/runtime/examples/system_metrics/src/lib.rs
Update server/client setups to attach ("model", ...) via add_labels before building clients/servers; control flow now includes fallible labeling step.

Sequence Diagram(s)

sequenceDiagram
  participant App as App (VLLM/Examples)
  participant RT as Runtime Endpoint
  participant Pub as WorkerMetricsPublisher
  participant Met as Metrics Registry

  App->>RT: endpoint("generate")
  alt Model provided
    App->>RT: add_labels([("model", model)])
  end
  App->>Pub: create_endpoint(component, model)
  Pub->>RT: endpoint("kv_router")
  alt Model provided
    Pub->>RT: add_labels([("model", model)])
  end
  Pub->>Met: register metrics with RT.stored_labels()
  Met-->>Pub: ok
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit hops through routes so wide,
Tucks “model” labels by its side.
Endpoints tagged, the metrics sing,
Grafana charts do everything.
With gentle paws it threads the line—
From VLLM to crates so fine.
Hop, label, hop—now all align! 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
lib/runtime/src/component.rs (2)

435-436: Doc nit: fix the chaining example

The example suggests calling add_labels directly on a Namespace. It should show the full chain to Endpoint.

Apply this diff to clarify:

-    ///   This allows for method chaining like: runtime.namespace(...).add_labels(...)?
+    ///   This allows for method chaining like:
+    ///   runtime.namespace(...).component(...).endpoint(...).add_labels(...)? 

439-474: Label validation and builder-style return are solid; consider future cross-hierarchy checks

  • Duplicate-key detection within input and refusal to overwrite existing endpoint labels are correct and safe.
  • Builder-style self consumption enables fluent chaining.

Optional: If you plan to support labels at Namespace/Component as well, consider whether conflicts should be validated across the entire hierarchy (Namespace, Component, Endpoint) during metrics registration.

Would you like a follow-up PR sketch that aggregates/validates labels across hierarchy before metric registration?

components/metrics/src/main.rs (1)

140-147: Correctly attaches model label at endpoint selection time

Labeling the endpoint with ("model", model) before using it ensures that metrics registered downstream observe the label. Error propagation via ? is appropriate.

Consider defining a shared constant (e.g., const MODEL_LABEL: &str = "model";) to avoid typos and make the key reusable across the codebase (VLLM publisher, examples, etc.).

lib/runtime/examples/hello_world/src/bin/server.rs (1)

76-77: Nice showcase of per-endpoint model labeling

Good example demonstrating how to tag an endpoint with a model label before starting the handler. Consider centralizing "model" as a constant to keep examples consistent and typo-safe.

lib/runtime/examples/hello_world/src/bin/client.rs (1)

35-36: Client-side labeling is consistent, but optional

Adding the model label on the client Endpoint is fine for symmetry. In most deployments, only the server-side endpoint labels impact the server’s metrics. Keeping it here for demonstration is OK.

lib/llm/src/mocker/engine.rs (1)

192-195: Avoid hardcoding the model string in create_endpoint

Passing Some("mock_model".to_string()) works, but consider moving "mock_model" to a module-level constant or deriving it from test config to keep it consistent across mock components.

lib/runtime/examples/system_metrics/src/lib.rs (1)

97-99: Endpoint labeling is correct; consider propagating labels to custom metrics too

You add the model label to the endpoint, but MySystemStatsMetrics::from_endpoint currently calls create_intcounter(..., &[]) so this custom metric may miss the endpoint labels. If Endpoint.create_* does not auto-merge stored labels, consider passing stored_labels() for consistency.

For example (outside the changed hunk):

impl MySystemStatsMetrics {
    pub fn from_endpoint(endpoint: &dynamo_runtime::component::Endpoint) -> anyhow::Result<Self> {
        let labels = endpoint.stored_labels();
        let data_bytes_processed = endpoint.create_intcounter(
            "my_custom_bytes_processed_total",
            "Example of a custom metric. Total number of data bytes processed by system handler",
            &labels,
        )?;
        Ok(Self { data_bytes_processed })
    }
}
lib/llm/src/kv_router/publisher.rs (1)

514-519: Optionally skip empty/whitespace-only model names to avoid meaningless labels.

Guard against accidentally setting an empty label value.

Apply:

-        let endpoint = {
-            let mut e = component.endpoint(KV_METRICS_ENDPOINT);
-            if let Some(model_name) = model {
-                e = e.add_labels(&[("model", model_name.as_str())])?;
-            }
-            e
-        };
+        let endpoint = {
+            let mut e = component.endpoint(KV_METRICS_ENDPOINT);
+            if let Some(model_name) = model.filter(|m| !m.trim().is_empty()) {
+                e = e.add_labels(&[("model", model_name.as_str())])?;
+            }
+            e
+        };
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fb10ffb and c23f86f.

📒 Files selected for processing (15)
  • components/backends/vllm/src/dynamo/vllm/main.py (3 hunks)
  • components/backends/vllm/src/dynamo/vllm/publisher.py (3 hunks)
  • components/metrics/src/main.rs (1 hunks)
  • lib/bindings/python/rust/lib.rs (1 hunks)
  • lib/bindings/python/rust/llm/kv.rs (1 hunks)
  • lib/llm/src/kv_router/publisher.rs (2 hunks)
  • lib/llm/src/mocker/engine.rs (1 hunks)
  • lib/runtime/examples/hello_world/src/bin/client.rs (1 hunks)
  • lib/runtime/examples/hello_world/src/bin/server.rs (1 hunks)
  • lib/runtime/examples/service_metrics/src/bin/service_client.rs (1 hunks)
  • lib/runtime/examples/service_metrics/src/bin/service_server.rs (1 hunks)
  • lib/runtime/examples/system_metrics/src/lib.rs (1 hunks)
  • lib/runtime/src/component.rs (1 hunks)
  • lib/runtime/src/metrics.rs (1 hunks)
  • lib/runtime/src/pipeline/network/ingress/push_handler.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-05T01:04:24.775Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.

Applied to files:

  • lib/llm/src/kv_router/publisher.rs
  • lib/llm/src/mocker/engine.rs
  • components/backends/vllm/src/dynamo/vllm/publisher.py
  • lib/bindings/python/rust/llm/kv.rs
📚 Learning: 2025-08-09T06:10:00.225Z
Learnt from: tzulingk
PR: ai-dynamo/dynamo#2389
File: components/metrics/src/main.rs:63-66
Timestamp: 2025-08-09T06:10:00.225Z
Learning: In components/metrics/src/main.rs, the Args struct's model_name field should be a required String (not Option<String>) because the metrics aggregator must be tied to a specific model when collecting metrics for a component/endpoint.

Applied to files:

  • lib/llm/src/kv_router/publisher.rs
  • lib/llm/src/mocker/engine.rs
  • lib/bindings/python/rust/llm/kv.rs
🧬 Code Graph Analysis (9)
lib/runtime/src/pipeline/network/ingress/push_handler.rs (2)
lib/bindings/python/rust/lib.rs (1)
  • endpoint (473-479)
lib/runtime/src/component.rs (1)
  • endpoint (226-233)
lib/llm/src/kv_router/publisher.rs (5)
lib/bindings/python/rust/llm/kv.rs (1)
  • create_endpoint (59-74)
lib/bindings/python/src/dynamo/_core.pyi (4)
  • create_endpoint (423-427)
  • component (187-191)
  • Component (193-210)
  • endpoint (206-210)
lib/bindings/python/rust/lib.rs (2)
  • component (558-564)
  • endpoint (473-479)
lib/runtime/src/component.rs (3)
  • component (488-490)
  • component (635-654)
  • endpoint (226-233)
lib/llm/src/local_model.rs (1)
  • model_name (87-90)
lib/bindings/python/rust/lib.rs (3)
lib/runtime/src/component.rs (1)
  • add_labels (439-474)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • Endpoint (212-240)
lib/bindings/python/rust/llm/entrypoint.rs (1)
  • to_pyerr (251-256)
lib/runtime/src/metrics.rs (2)
lib/runtime/src/component.rs (8)
  • namespace (218-220)
  • namespace (657-664)
  • component (488-490)
  • component (635-654)
  • endpoint (226-233)
  • name (222-224)
  • name (484-486)
  • name (670-675)
lib/runtime/src/distributed.rs (1)
  • namespace (185-187)
components/backends/vllm/src/dynamo/vllm/publisher.py (3)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • Component (193-210)
  • WorkerMetricsPublisher (411-436)
lib/llm/src/kv_router/publisher.rs (1)
  • create_endpoint (501-530)
lib/bindings/python/rust/llm/kv.rs (1)
  • create_endpoint (59-74)
lib/bindings/python/rust/llm/kv.rs (2)
lib/bindings/python/src/dynamo/_core.pyi (3)
  • component (187-191)
  • create_endpoint (423-427)
  • Component (193-210)
lib/llm/src/kv_router/publisher.rs (1)
  • create_endpoint (501-530)
lib/runtime/examples/service_metrics/src/bin/service_client.rs (3)
lib/runtime/src/component.rs (3)
  • client (565-571)
  • component (488-490)
  • component (635-654)
lib/runtime/src/transports/nats.rs (1)
  • client (68-70)
lib/runtime/tests/soak.rs (1)
  • client (146-209)
components/backends/vllm/src/dynamo/vllm/main.py (4)
lib/bindings/python/rust/lib.rs (3)
  • component (558-564)
  • endpoint (473-479)
  • add_labels (542-553)
lib/runtime/src/component.rs (4)
  • component (488-490)
  • component (635-654)
  • endpoint (226-233)
  • add_labels (439-474)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • component (187-191)
  • endpoint (206-210)
components/backends/vllm/src/dynamo/vllm/publisher.py (1)
  • StatLoggerFactory (132-165)
lib/runtime/src/component.rs (1)
lib/bindings/python/rust/lib.rs (2)
  • v (680-680)
  • add_labels (542-553)
🪛 GitHub Actions: Rust pre-merge checks
lib/bindings/python/rust/lib.rs

[error] 536-536: Cargo fmt -- --check failed. Code is not properly formatted in lib/bindings/python/rust/lib.rs. Run 'cargo fmt' to fix formatting.

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2474/merge) by tzulingk.
lib/bindings/python/rust/lib.rs

[error] 1-1: Trailing whitespace detected. Changes auto-fixed by the trailing-whitespace hook.


[error] 1-1: Ruff linting failed and auto-fixed issues in this file during pre-commit.

components/backends/vllm/src/dynamo/vllm/publisher.py

[error] 1-1: Ruff linting failed and auto-fixed issues in this file during pre-commit.

🪛 Ruff (0.12.2)
components/backends/vllm/src/dynamo/vllm/publisher.py

4-4: optparse.Option imported but unused

Remove unused import: optparse.Option

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (12)
lib/bindings/python/rust/lib.rs (1)

539-553: API addition looks good and preserves event_loop correctly

Accepts List[Tuple[str, str]], converts to (&str, &str) pairs, delegates to inner Endpoint.add_labels, and returns a new Endpoint reusing the same event_loop. This is the right shape for chaining from Python.

lib/runtime/src/component.rs (1)

421-427: stored_labels: simple, zero-copy view is correct

Returning Vec<(&str, &str)> from internal String storage is appropriate for passing to metrics constructors without cloning. Good use of iterators.

lib/runtime/examples/service_metrics/src/bin/service_server.rs (1)

78-79: Model label at endpoint registration looks correct

Adding the model label before endpoint_builder ensures metrics registered for this endpoint will include the label. Nice.

lib/runtime/src/pipeline/network/ingress/push_handler.rs (1)

58-96: Propagating endpoint labels into all handler metrics is the right fix; watch for label-name clashes

Using endpoint.stored_labels() and passing them to all metric constructors will correctly attach constant labels (e.g., model) at registration time.

One caveat: ensure Endpoint.add_labels prevents or rejects keys that conflict with variable label names of a metric (e.g., "error_type" for errors_total). A collision there would cause registration failures or inconsistent series.

If not already enforced, consider adding validation to Endpoint.add_labels to disallow conflicts with reserved/variable label names for metrics created via this API.

components/backends/vllm/src/dynamo/vllm/main.py (3)

138-143: Good: label endpoints with model before serving prefill routes

Attaching [("model", config.model)] at endpoint construction ensures metrics and routing pick up the model label from the start.


175-180: Good: label endpoints with model before serving decode routes

Mirrors the prefill change and aligns with the metrics pipeline expecting labels at registration time.


189-191: Passing model into StatLoggerFactory addresses first-registration gap

Providing config.model to StatLoggerFactory ensures the initial registration of backend metrics includes the model label, preventing duplicate registrations with and without the label.

lib/bindings/python/rust/llm/kv.rs (1)

58-64: Good: model plumbed through PyO3 with backward-compatible signature. Also pass-through to Rust publisher looks correct.

The optional model parameter in the PyO3 signature aligns with the Rust API change and preserves back-compat via a default None. Forwarding the argument in the async block is correct.

Also applies to: 69-70

lib/runtime/src/metrics.rs (1)

1309-1350: LGTM: test asserts end-to-end label propagation via Endpoint.add_labels + stored_labels.

The test validates auto-labels and custom labels in Prometheus output and exercises the exact path we depend on. Solid coverage.

lib/runtime/examples/service_metrics/src/bin/service_client.rs (1)

36-40: LGTM: model label applied at endpoint before client creation.

This ensures metrics created later can inherit the model label. No conflicts with auto-labels.

lib/llm/src/kv_router/publisher.rs (1)

501-521: LGTM: optional model label added at endpoint construction time.

Registering the endpoint with the model label before starting the handler satisfies the PR objective for labeled metrics. Error propagation with ? is correct.

components/backends/vllm/src/dynamo/vllm/publisher.py (1)

135-142: LGTM: factory plumbs model through; defaults maintain back-compat.

The model is stored on the factory and forwarded to the publisher; dp_rank logic remains intact.

Also applies to: 146-149

@tzulingk tzulingk merged commit 5772890 into main Aug 21, 2025
12 of 13 checks passed
@tzulingk tzulingk deleted the tzulingk/model_label_dis_444 branch August 21, 2025 17:32
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Co-authored-by: Keiven Chang <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
nv-anants pushed a commit that referenced this pull request Aug 28, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Co-authored-by: Keiven Chang <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants