-
Notifications
You must be signed in to change notification settings - Fork 629
feat: Add PoC of sglang metrics server (default port 30000) #3407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a conditional startup of a dummy health-check server in sglang runtime initialization when metrics are enabled, passing host, port, and enable_metrics from server_args. No other public interfaces or exports are modified. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Runtime.init()
participant Config as server_args
participant HC as DummyHealthCheckServer
Caller->>Config: Read enable_metrics, host, port
alt enable_metrics == true
Caller->>HC: launch_dummy_health_check_server(host, port, enable_metrics)
note right of HC: Health-check server starts
else
note over Caller,Config: No health-check server started
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
components/src/dynamo/sglang/main.py (1)
68-71
: Add error handling for health check server launch.The health check server is launched without any error handling. If the server fails to start (e.g., port already in use), it could cause silent failures or unexpected behavior.
Consider wrapping the call in a try-except block:
if server_args.enable_metrics: - launch_dummy_health_check_server( - server_args.host, server_args.port, server_args.enable_metrics - ) + try: + launch_dummy_health_check_server( + server_args.host, server_args.port, server_args.enable_metrics + ) + logging.info(f"Metrics server started on {server_args.host}:{server_args.port}") + except Exception as e: + logging.warning(f"Failed to start metrics server: {e}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/src/dynamo/sglang/main.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: trtllm
- GitHub Check: vllm
- GitHub Check: sglang
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
components/src/dynamo/sglang/main.py (2)
12-12
: Verify import of launch_dummy_health_check_server
Confirm thatlaunch_dummy_health_check_server
is defined insglang.srt.utils
in the SGLang package your project uses and that it accepts(host, port, enable_metrics)
as arguments.
68-71
: Ensure metrics server is launched for all SGLang engine initializations or confirm scoped exposure
Metrics startup is only ininit
(main.py:64–71). These functions also createsgl.Engine
but omit metrics:
init_prefill
(main.py:160–164)init_multimodal_worker
(main.py:305–316)init_multimodal_prefill_worker
(main.py:356–360)
If metrics should cover all engine instances, add the sameif server_args.enable_metrics
block to each; otherwise confirm that only the main decode worker requires metrics.
if server_args.enable_metrics: | ||
launch_dummy_health_check_server( | ||
server_args.host, server_args.port, server_args.enable_metrics | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify the port parameter matches the intended metrics port.
According to the PR description, the metrics server should run on port 30000 by default. However, this code passes server_args.port
, which is likely the main SGLang server port (typically 8000 or similar).
Please confirm whether:
server_args.port
contains the correct metrics port (30000), or- A separate metrics port configuration should be used (e.g.,
server_args.metrics_port
)
Run the following script to check how server_args.port
is configured and if there's a separate metrics port:
🏁 Script executed:
#!/bin/bash
# Description: Check server_args configuration and port usage
# Find where server_args is defined and what ports are available
ast-grep --pattern 'class $_ {
$$$
port = $$$
$$$
}'
# Search for metrics_port or similar configuration
rg -nP '(metrics_port|port.*30000|30000)' --type=py -C3
# Check the args parsing to see available port configurations
rg -nP 'def\s+parse_args' --type=py -A 50
Length of output: 47344
Introduce a dedicated --metrics-port
CLI option
The metrics server currently binds to server_args.port
(the main API port), causing conflicts. Add a --metrics-port
flag (default 30000) in ServerArgs.add_cli_args
(components/src/dynamo/sglang/args.py
) and update the launch_dummy_health_check_server
call in components/src/dynamo/sglang/main.py
(around lines 68–71) to use server_args.metrics_port
.
🤖 Prompt for AI Agents
In components/src/dynamo/sglang/main.py around lines 68 to 71, the metrics
server is being started using server_args.port which conflicts with the main API
port; update the code to call launch_dummy_health_check_server(server_args.host,
server_args.metrics_port, server_args.enable_metrics) and add a new CLI flag in
components/src/dynamo/sglang/args.py by introducing a --metrics-port argument on
ServerArgs.add_cli_args with a default of 30000 (and parse/store it as
metrics_port on ServerArgs); ensure any type parsing matches other port args
(int) and update any help text accordingly.
server_args, dynamo_args = config.server_args, config.dynamo_args | ||
|
||
engine = sgl.Engine(server_args=server_args) | ||
if server_args.enable_metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to consider either now or later:
- Do we want this health check server up always even for non-metrics purposes? Then we can remove the
if server_args.enable_metrics:
- Do we want metrics server up always / by default?
- If so, we can default
server_args.enable_metrics = True
in our worker code - If not, we can also consider the worker-specific env vars that toggle metrics today like DYN_SYSTEM_ENABLED=true
- However, our current UX proposition is to match sglang as closely as possible on CLI commands for seamless transition - so toggling sglang engine metrics with a unique dynamo env var here seems like an anti-pattern to me.
- If so, we can default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future when we expose metrics via rust endpoint - do we need the dummy health check any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the same flag as sglang - but ideally it would be surfaced via our endpoint and not need a seperate server - but that is dependent on wait we find from @keivenchang 's work
Overview:
@keivenchang is looking at exposing full set of SGLang metrics when running through dynamo (
python -m dynamo.sglang ...
), and ideally without having to 1:1 map and redefine every single metric SGLang has, and constantly maintain/update everytime a new metric is added.This exposes the built-in sglang metrics server when running dynamo+sglang.
Details:
Build with these local changes
Run with these changes
NOTE: The metrics server output seems to be empty until a single inference request has been received.
Summary by CodeRabbit