-
Notifications
You must be signed in to change notification settings - Fork 632
feat: add /dump_config
to the frontend
#3455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d772904
to
9e6e97d
Compare
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
d97d5f0
to
4c3b6f8
Compare
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughAdds a configurable, optional dump_config HTTP endpoint across the stack. Introduces get_config_endpoint and supporting data assembly in Python, wires endpoint serving into sglang/trtllm/vllm runtimes, and plumbs a new “extremely unsafe” exposure flag from CLI through Rust builders to the HTTP service. Switches HTTP service to DistributedRuntime and adds a dump_config router. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant HTTP as HTTP Service
participant DRT as DistributedRuntime
participant ETCD as Etcd (via DRT)
participant Inst as Worker Instance(s)
participant Py as get_config_endpoint
User->>HTTP: GET /dump_config
HTTP->>DRT: Query instances (endpoint="dump_config")
DRT->>ETCD: List/Discover instances
ETCD-->>DRT: Instance set
alt Instances found
loop For each instance
HTTP->>Inst: Request dump_config
Inst->>Py: get_config_endpoint(config)
Py-->>Inst: JSON config or error
Inst-->>HTTP: Response
end
HTTP-->>User: Aggregated JSON array (configs/errors)
else None found
HTTP-->>User: Message: no instances
end
sequenceDiagram
autonumber
actor Admin as CLI User
participant CLI as dynamo/frontend/main.py
participant PyBind as EntrypointArgs (Py/Rust)
participant LMB as LocalModelBuilder
participant HTTPB as HttpServiceConfigBuilder
participant HTTP as HTTP Service
Admin->>CLI: --extremely-unsafe-do-not-use-in-prod-expose-dump-config
CLI->>PyBind: EntrypointArgs{ flag: true }
PyBind->>LMB: extremely_unsafe_do_not_use_in_prod_expose_dump_config(true)
LMB-->>PyBind: LocalModel{ flag: true }
PyBind->>HTTPB: .extremely_unsafe_do_not_use_in_prod_expose_dump_config(true)
HTTPB->>HTTP: Build with dump_config route enabled
HTTP-->>Admin: Service exposes /dump_config
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/main.py (1)
348-376
: Make endpoint name consistent with other workersEvery other worker registers the config dump handle under
"dump_config"
, but here we callcomponent.endpoint("config")
. The frontend discovery loop filters on the"dump_config"
endpoint (see get_config_handler_inner). This mismatch means multimodal workers never surface their config, breaking the cross-runtime dump. Please switch this endpoint name to"dump_config"
for consistency.
🧹 Nitpick comments (3)
components/src/dynamo/common/config_dump/config_dumper.py (2)
80-88
: Consider removing unusedrequest
parameter or document its purpose.The
request
parameter is currently unused. If it's reserved for future endpoint signature compatibility, consider adding a comment or using_request
to signal the intentional omission.Apply this diff if the parameter serves no purpose:
-async def get_config_endpoint(config: Any, request=None): +async def get_config_endpoint(config: Any):Or document its purpose:
-async def get_config_endpoint(config: Any, request=None): +async def get_config_endpoint(config: Any, request=None): # request reserved for endpoint signature compatibility
82-83
: Address TODO: dict serialization through endpoint.The TODO comment indicates that returning the dict directly doesn't work correctly through the endpoint. This is a known limitation that forces JSON string serialization.
Would you like me to investigate the endpoint serialization behavior or open an issue to track this technical debt?
lib/llm/src/http/service/dump_config.rs (1)
207-207
: Use appropriate log level.This appears to be debug output rather than a warning condition.
- tracing::warn!("message: {}", message_str); + tracing::debug!("Received config message: {}", message_str);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.gitignore
(1 hunks)components/src/dynamo/common/config_dump/__init__.py
(2 hunks)components/src/dynamo/common/config_dump/config_dumper.py
(6 hunks)components/src/dynamo/frontend/main.py
(2 hunks)components/src/dynamo/sglang/main.py
(15 hunks)components/src/dynamo/trtllm/main.py
(5 hunks)components/src/dynamo/vllm/main.py
(6 hunks)lib/bindings/python/rust/llm/entrypoint.rs
(4 hunks)lib/llm/src/entrypoint/input/http.rs
(1 hunks)lib/llm/src/http/service.rs
(1 hunks)lib/llm/src/http/service/dump_config.rs
(1 hunks)lib/llm/src/http/service/service_v2.rs
(10 hunks)lib/llm/src/local_model.rs
(7 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
components/src/dynamo/common/config_dump/config_dumper.py
80-80: Unused function argument: request
(ARG001)
170-170: Consider moving this statement to an else
block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (amd64)
🔇 Additional comments (17)
.gitignore (1)
106-108
: Good call ignoring Direnv config.Keeping
.envrc
out of version control prevents accidental leakage of local env overrides and plays nicely with direnv setups.components/src/dynamo/common/config_dump/config_dumper.py (3)
114-119
: Approved: Logging level adjustment.Changing from
info
todebug
for CONFIG_DUMP output appropriately reduces production log noise while preserving diagnostic capability.
129-180
: LGTM: Clean separation of data assembly and encoding.The extraction of
_get_config_dump_data
separates concerns well—data collection returns a dict, encoding happens at the boundary. Error handling preserves diagnostic data even on failure.
198-212
: Approved: Dict passthrough support.The early return for dict types (lines 205-206) is necessary for the refactored flow where
_get_config_dump_data
returns dicts before encoding. The updated return type annotation correctly reflects this.lib/llm/src/http/service.rs (1)
24-24
: LGTM: Module exposure.The addition of the
dump_config
module to the public API surface is straightforward and aligns with the PR's objective to expose configuration dumping functionality.components/src/dynamo/vllm/main.py (1)
177-177
: Approved: Consistent dump_config endpoint pattern.The implementation correctly:
- Creates the
dump_config
endpoint in both prefill and decode initialization paths- Uses
functools.partial(get_config_endpoint, config)
to bind configuration- Serves concurrently with main endpoints via
asyncio.gather
- Applies consistent metrics labeling
This pattern is repeated across vllm, sglang, and trtllm components, ensuring uniform configuration dumping capability.
Also applies to: 234-234, 210-213, 324-327
components/src/dynamo/common/config_dump/__init__.py (1)
15-15
: LGTM: Public API expansion.Correctly exports
get_config_endpoint
for consumption by vllm, sglang, trtllm, and other components requiring configuration dump capabilities.Also applies to: 29-29
lib/llm/src/entrypoint/input/http.rs (2)
54-57
: Approved: Unsafe flag propagation with clear naming.The method name
extremely_unsafe_do_not_use_in_prod_expose_dump_config
appropriately signals the security implications. Propagating this flag fromlocal_model
tohttp_service_builder
enables runtime control over dump_config endpoint exposure.
64-64
: Architectural shift: etcd to distributed runtime.The change from
with_etcd_client(etcd_client.clone())
towith_drt(Some(distributed_runtime.clone()))
represents a broader architectural evolution noted in the PR summary. This enables the dump_config functionality while maintaining existing etcd-based watch flows.lib/bindings/python/rust/llm/entrypoint.rs (1)
122-122
: Approved: Complete Python-Rust binding plumbing.The flag is correctly:
- Added to
EntrypointArgs
struct (line 122)- Accepted as optional parameter in constructor (lines 129, 146)
- Stored in the struct (line 172)
- Propagated to
LocalModelBuilder
with safe defaultunwrap_or(false)
(lines 207-209)This ensures Python callers can optionally enable dump_config exposure while defaulting to the secure behavior.
Also applies to: 129-129, 146-146, 172-172, 206-210
components/src/dynamo/frontend/main.py (2)
213-217
: Approved: Clear production warning in CLI.The argument help text prominently warns users with "DO NOT USE IN PROD!" and explicitly lists the sensitive information exposed (config, environment variables, system info, GPU info, installed packages). This transparency is critical for informed decision-making.
282-285
: Approved: Conditional flag propagation.Only adding the kwarg when the flag is set maintains clean configuration and avoids polluting EntrypointArgs with unnecessary parameters.
components/src/dynamo/trtllm/main.py (1)
5-5
: Approved: Consistent trtllm dump_config implementation.The implementation mirrors the vllm pattern:
functools
andget_config_endpoint
imports (lines 5, 38)config_to_dump
dict assembling both engine and dynamo args (lines 286-288)- Endpoint served concurrently with
asyncio.gather
andfunctools.partial
(lines 379-389, 392-399)- Branching for publish_events_and_metrics preserves existing conditional serving logic
This consistency across backends (vllm, sglang, trtllm) ensures uniform configuration dumping behavior.
Also applies to: 38-38, 286-288, 361-361, 379-399
lib/llm/src/http/service/dump_config.rs (2)
63-120
: LGTM! Good resilient error handling.The handler properly continues collecting configs even when individual instances fail, providing a complete view with error information for failed instances. The per-instance error handling ensures one failure doesn't break the entire response.
51-61
: Require authentication/authorization for dump_config endpointExposes full configuration (including credentials/internal endpoints). Ensure the globally composed router applies an auth middleware restricting access to admin/debug users, and sanitize any secrets/PII before returning.
lib/llm/src/http/service/service_v2.rs (2)
91-105
: LGTM! Clean DistributedRuntime integration.The state changes properly integrate DistributedRuntime while maintaining backward compatibility. Deriving the etcd_client from drt (line 92) ensures consistency and avoids duplicate client instances.
342-350
: Good safety guardrails for debug endpoint.The dump_config endpoint is properly gated behind an explicitly unsafe flag with a clear warning. The warning message effectively communicates the risk.
Note: Ensure this flag is never enabled in production deployments. Consider adding runtime environment checks (e.g., reject if
NODE_ENV=production
) as an additional safeguard.
// TODO: this is very hacky and needs to be improved, should I be tracking all the endpoints as they come up? | ||
// Wait for the client to discover instances from etcd | ||
client | ||
.wait_for_instances() | ||
.await | ||
.map_err(|e| format!("Failed to wait for instances: {}", e))?; | ||
|
||
// Additional wait: Give the background monitor_instance_source task time to populate instance_avail | ||
// The Client spawns a background task that updates instance_avail from instance_source, | ||
// but it runs asynchronously. We need to ensure it has run at least once. | ||
let max_retries = 50; // 50 * 10ms = 500ms max wait | ||
for _ in 0..max_retries { | ||
let avail_ids = client.instance_ids_avail(); | ||
if avail_ids.contains(&instance.instance_id) { | ||
break; | ||
} | ||
tokio::time::sleep(tokio::time::Duration::from_millis(10)).await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the acknowledged "very hacky" implementation.
The TODO comment and polling loop indicate a known design issue. The current approach polls for instance availability with hardcoded retries (50 × 10ms = 500ms max), which is both brittle and inefficient.
Consider these improvements:
- Replace polling with proper event-driven instance discovery (e.g., watch for instance availability events)
- Make timeout/retry configuration explicit rather than hardcoded
- Document why this approach is necessary if a better solution isn't feasible yet
- // TODO: this is very hacky and needs to be improved, should I be tracking all the endpoints as they come up?
- // Wait for the client to discover instances from etcd
client
.wait_for_instances()
.await
.map_err(|e| format!("Failed to wait for instances: {}", e))?;
- // Additional wait: Give the background monitor_instance_source task time to populate instance_avail
- // The Client spawns a background task that updates instance_avail from instance_source,
- // but it runs asynchronously. We need to ensure it has run at least once.
- let max_retries = 50; // 50 * 10ms = 500ms max wait
- for _ in 0..max_retries {
+ // Wait for instance to become available with configurable timeout
+ const POLL_INTERVAL_MS: u64 = 10;
+ const MAX_WAIT_MS: u64 = 500;
+ let max_retries = MAX_WAIT_MS / POLL_INTERVAL_MS;
+
+ for retry in 0..max_retries {
let avail_ids = client.instance_ids_avail();
if avail_ids.contains(&instance.instance_id) {
+ tracing::debug!("Instance became available after {}ms", retry * POLL_INTERVAL_MS);
break;
}
- tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
+ tokio::time::sleep(tokio::time::Duration::from_millis(POLL_INTERVAL_MS)).await;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// TODO: this is very hacky and needs to be improved, should I be tracking all the endpoints as they come up? | |
// Wait for the client to discover instances from etcd | |
client | |
.wait_for_instances() | |
.await | |
.map_err(|e| format!("Failed to wait for instances: {}", e))?; | |
// Additional wait: Give the background monitor_instance_source task time to populate instance_avail | |
// The Client spawns a background task that updates instance_avail from instance_source, | |
// but it runs asynchronously. We need to ensure it has run at least once. | |
let max_retries = 50; // 50 * 10ms = 500ms max wait | |
for _ in 0..max_retries { | |
let avail_ids = client.instance_ids_avail(); | |
if avail_ids.contains(&instance.instance_id) { | |
break; | |
} | |
tokio::time::sleep(tokio::time::Duration::from_millis(10)).await; | |
} | |
client | |
.wait_for_instances() | |
.await | |
.map_err(|e| format!("Failed to wait for instances: {}", e))?; | |
// Wait for instance to become available with configurable timeout | |
const POLL_INTERVAL_MS: u64 = 10; | |
const MAX_WAIT_MS: u64 = 500; | |
let max_retries = MAX_WAIT_MS / POLL_INTERVAL_MS; | |
for retry in 0..max_retries { | |
let avail_ids = client.instance_ids_avail(); | |
if avail_ids.contains(&instance.instance_id) { | |
tracing::debug!( | |
"Instance became available after {}ms", | |
retry * POLL_INTERVAL_MS | |
); | |
break; | |
} | |
tokio::time::sleep(tokio::time::Duration::from_millis(POLL_INTERVAL_MS)).await; | |
} |
🤖 Prompt for AI Agents
lib/llm/src/http/service/dump_config.rs lines 140-157: the current hardcoded
polling loop that waits up to 50×10ms for client.instance_ids_avail() is brittle
and flagged as "very hacky" — replace it by adding or using an event-driven wait
on the client (e.g., implement/consume a client.wait_for_instance(instance_id,
timeout) that awaits a notification/watch from the instance source), remove the
ad-hoc loop, make the timeout/backoff configurable via function params or config
values (do not hardcode 50/10ms), and add a short comment documenting why the
wait is needed and what the chosen timeout semantics are if an immediate
event-driven solution isn’t possible.
// I'm not sure why I can't nest more than one level, but when | ||
// I do, it passes through weird json | ||
let message = response | ||
.get_mut("data") | ||
.map(|v| v.take()) | ||
.ok_or_else(|| format!("No data field in response {:?}", response))? | ||
.get_mut("message") | ||
.map(|v| v.take()) | ||
.ok_or_else(|| format!("No message field in response {:?}", response))?; | ||
if let Some(message_str) = message.as_str() { | ||
// The message is itself a json string | ||
tracing::warn!("message: {}", message_str); | ||
let message_json = serde_json::from_str(message_str) | ||
.map_err(|e| format!("Failed to parse message as json: {}", e))?; | ||
Ok(message_json) | ||
} else { | ||
Err(format!("message field is not a string {:?}", response)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Simplify the nested JSON extraction protocol.
The response structure requires double-parsing: extracting a JSON string from response.data.message
and then parsing that string as JSON. This suggests a protocol design issue.
The nested structure should be simplified:
- Current:
{"data": {"message": "{\"config\": ...}"}}
(JSON string inside JSON) - Better:
{"data": {"config": {...}}}
(direct JSON object)
If the nested structure is unavoidable due to existing protocol constraints:
- Add documentation explaining why this structure is necessary
- Consider creating a helper function to encapsulate this extraction logic
- Improve error messages to help debug protocol mismatches
Example helper:
fn extract_config_from_response(mut response: serde_json::Value) -> Result<Value, String> {
let message = response
.get_mut("data")
.and_then(|v| v.get_mut("message"))
.and_then(|v| v.take())
.ok_or_else(|| format!("Missing data.message field in response: {:?}", response))?;
let message_str = message
.as_str()
.ok_or_else(|| format!("message field is not a string: {:?}", message))?;
serde_json::from_str(message_str)
.map_err(|e| format!("Failed to parse message as JSON: {}", e))
}
🤖 Prompt for AI Agents
In lib/llm/src/http/service/dump_config.rs around lines 196 to 213, the code
double-parses a JSON string held in response.data.message which is brittle and
hard to follow; replace the inline nested extraction with a small helper
function (e.g., extract_config_from_response) that takes ownership or a mutable
response Value, safely navigates to data.message using chained and_then/get_mut
or get/and_then, validates that message is a string with clear error messages,
parses that string with serde_json::from_str and returns Result<Value, String>;
add a short doc comment above the helper explaining why the nested string format
exists (or note that it should be flattened in protocol), and update the caller
to use the helper so the extraction logic and improved error messages are
centralized.
Overview:
Adds a method to the frontend to dump config of every instance. Follow up to feat: add flag for dumping dynamo engine config and environment #3286.
Waiting on feat: dynamic endpoint registration #3418 due to storing DistributedRuntime in state
Details:
TODO
--unsafe-debug-only-in-testing-expose-dump-config
Where should the reviewer start?
Mostly in
service_v2.rs
,config.rs
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Refactor
Chores