Skip to content

Conversation

rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Oct 3, 2025

Overview:

@keivenchang is looking at exposing full set of SGLang metrics when running through dynamo (python -m dynamo.sglang ...), and ideally without having to 1:1 map and redefine every single metric SGLang has, and constantly maintain/update everytime a new metric is added.

This exposes the built-in sglang metrics server when running dynamo+sglang.

Details:

Build with these local changes

pushd lib/bindings/python
maturin develop --uv

popd
uv pip install .[sglang]

Run with these changes

python -m dynamo.frontend &

# KEY: Set --enable-metrics
python -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --enable-metrics &

# See server come up in sglang worker log output
# 2025-10-03T17:19:07.084539Z  INFO utils.launch_dummy_health_check_server: Dummy health check server scheduled on existing loop at 127.0.0.1:30000   

# Send inference request
curl localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '
{
  "model": "Qwen/Qwen3-0.6B",
  "messages": [{"role": "user", "content": "Write me a DND campaign"}],
  "stream": true,
  "max_tokens": 2,
  "ignore_eos": true
}'

# Check metrics
curl localhost:30000/metrics

NOTE: The metrics server output seems to be empty until a single inference request has been received.

Summary by CodeRabbit

  • New Features
    • Automatically starts a lightweight health check server when metrics are enabled.
    • Uses the configured host and port from existing settings for the health check server.

Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

Walkthrough

Adds a conditional startup of a dummy health-check server in sglang runtime initialization when metrics are enabled, passing host, port, and enable_metrics from server_args. No other public interfaces or exports are modified.

Changes

Cohort / File(s) Summary
SGLang runtime health check integration
components/src/dynamo/sglang/main.py
On init, if server_args.enable_metrics is true, calls launch_dummy_health_check_server(host, port, enable_metrics) to start a dummy health-check server.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as Runtime.init()
    participant Config as server_args
    participant HC as DummyHealthCheckServer

    Caller->>Config: Read enable_metrics, host, port
    alt enable_metrics == true
        Caller->>HC: launch_dummy_health_check_server(host, port, enable_metrics)
        note right of HC: Health-check server starts
    else
        note over Caller,Config: No health-check server started
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my ears at startup’s chime,
A tiny server wakes in time—
If metrics sing, I hop to check,
A heartbeat thumps on port and spec.
When silence falls, I softly stay,
Nose to wind, and bound away.

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description follows the template for Overview and Details but omits the required sections “Where should the reviewer start?” and “Related Issues,” so it does not fully conform to the repository’s description template. Please add a “Where should the reviewer start?” section that points to the key files changed and include a “Related Issues” section listing any linked issue numbers or action keywords.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the introduction of a proof-of-concept SGLang metrics server and specifies the default port, clearly reflecting the core change in the pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
components/src/dynamo/sglang/main.py (1)

68-71: Add error handling for health check server launch.

The health check server is launched without any error handling. If the server fails to start (e.g., port already in use), it could cause silent failures or unexpected behavior.

Consider wrapping the call in a try-except block:

 if server_args.enable_metrics:
-    launch_dummy_health_check_server(
-        server_args.host, server_args.port, server_args.enable_metrics
-    )
+    try:
+        launch_dummy_health_check_server(
+            server_args.host, server_args.port, server_args.enable_metrics
+        )
+        logging.info(f"Metrics server started on {server_args.host}:{server_args.port}")
+    except Exception as e:
+        logging.warning(f"Failed to start metrics server: {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1faf015 and 2678f28.

📒 Files selected for processing (1)
  • components/src/dynamo/sglang/main.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: trtllm
  • GitHub Check: vllm
  • GitHub Check: sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
components/src/dynamo/sglang/main.py (2)

12-12: Verify import of launch_dummy_health_check_server
Confirm that launch_dummy_health_check_server is defined in sglang.srt.utils in the SGLang package your project uses and that it accepts (host, port, enable_metrics) as arguments.


68-71: Ensure metrics server is launched for all SGLang engine initializations or confirm scoped exposure
Metrics startup is only in init (main.py:64–71). These functions also create sgl.Engine but omit metrics:

  • init_prefill (main.py:160–164)
  • init_multimodal_worker (main.py:305–316)
  • init_multimodal_prefill_worker (main.py:356–360)
    If metrics should cover all engine instances, add the same if server_args.enable_metrics block to each; otherwise confirm that only the main decode worker requires metrics.

Comment on lines +68 to +71
if server_args.enable_metrics:
launch_dummy_health_check_server(
server_args.host, server_args.port, server_args.enable_metrics
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify the port parameter matches the intended metrics port.

According to the PR description, the metrics server should run on port 30000 by default. However, this code passes server_args.port, which is likely the main SGLang server port (typically 8000 or similar).

Please confirm whether:

  1. server_args.port contains the correct metrics port (30000), or
  2. A separate metrics port configuration should be used (e.g., server_args.metrics_port)

Run the following script to check how server_args.port is configured and if there's a separate metrics port:


🏁 Script executed:

#!/bin/bash
# Description: Check server_args configuration and port usage

# Find where server_args is defined and what ports are available
ast-grep --pattern 'class $_ {
  $$$
  port = $$$
  $$$
}'

# Search for metrics_port or similar configuration
rg -nP '(metrics_port|port.*30000|30000)' --type=py -C3

# Check the args parsing to see available port configurations
rg -nP 'def\s+parse_args' --type=py -A 50

Length of output: 47344


Introduce a dedicated --metrics-port CLI option

The metrics server currently binds to server_args.port (the main API port), causing conflicts. Add a --metrics-port flag (default 30000) in ServerArgs.add_cli_args (components/src/dynamo/sglang/args.py) and update the launch_dummy_health_check_server call in components/src/dynamo/sglang/main.py (around lines 68–71) to use server_args.metrics_port.

🤖 Prompt for AI Agents
In components/src/dynamo/sglang/main.py around lines 68 to 71, the metrics
server is being started using server_args.port which conflicts with the main API
port; update the code to call launch_dummy_health_check_server(server_args.host,
server_args.metrics_port, server_args.enable_metrics) and add a new CLI flag in
components/src/dynamo/sglang/args.py by introducing a --metrics-port argument on
ServerArgs.add_cli_args with a default of 30000 (and parse/store it as
metrics_port on ServerArgs); ensure any type parsing matches other port args
(int) and update any help text accordingly.

server_args, dynamo_args = config.server_args, config.dynamo_args

engine = sgl.Engine(server_args=server_args)
if server_args.enable_metrics:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to consider either now or later:

  1. Do we want this health check server up always even for non-metrics purposes? Then we can remove the if server_args.enable_metrics:
  2. Do we want metrics server up always / by default?
    • If so, we can default server_args.enable_metrics = True in our worker code
    • If not, we can also consider the worker-specific env vars that toggle metrics today like DYN_SYSTEM_ENABLED=true
    • However, our current UX proposition is to match sglang as closely as possible on CLI commands for seamless transition - so toggling sglang engine metrics with a unique dynamo env var here seems like an anti-pattern to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future when we expose metrics via rust endpoint - do we need the dummy health check any more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the same flag as sglang - but ideally it would be surfaced via our endpoint and not need a seperate server - but that is dependent on wait we find from @keivenchang 's work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants