Skip to content

Conversation

Aphoh
Copy link
Contributor

@Aphoh Aphoh commented Oct 7, 2025

Overview:

Details:

  • Adds an endpoint to the frontend which dumps the config of every running instance

TODO

  • vllm
  • trtllm
  • sglang
  • gate behind --unsafe-debug-only-in-testing-expose-dump-config

Where should the reviewer start?

Mostly in service_v2.rs, config.rs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Optional HTTP endpoint to fetch live configuration, disabled by default. Enable via a new, explicitly unsafe CLI flag and runtime setting.
    • Configuration dump is available across supported backends and served alongside primary endpoints.
  • Refactor

    • HTTP service initialization now uses a distributed runtime for dynamic routing, replacing the previous approach.
  • Chores

    • Updated ignore rules to exclude Direnv files.

Copy link

copy-pr-bot bot commented Oct 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Oct 7, 2025
@Aphoh Aphoh force-pushed the warnold/create-config-dump-endpoint-v2 branch 2 times, most recently from d772904 to 9e6e97d Compare October 7, 2025 18:29
Aphoh added 9 commits October 7, 2025 17:35
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
Signed-off-by: William Arnold <[email protected]>
@Aphoh Aphoh force-pushed the warnold/create-config-dump-endpoint-v2 branch from d97d5f0 to 4c3b6f8 Compare October 8, 2025 02:42
@pull-request-size pull-request-size bot added size/L and removed size/XL labels Oct 8, 2025
Signed-off-by: William Arnold <[email protected]>
@Aphoh
Copy link
Contributor Author

Aphoh commented Oct 8, 2025

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

Walkthrough

Adds a configurable, optional dump_config HTTP endpoint across the stack. Introduces get_config_endpoint and supporting data assembly in Python, wires endpoint serving into sglang/trtllm/vllm runtimes, and plumbs a new “extremely unsafe” exposure flag from CLI through Rust builders to the HTTP service. Switches HTTP service to DistributedRuntime and adds a dump_config router.

Changes

Cohort / File(s) Summary
Repo hygiene
\.gitignore
Ignore Direnv’s .envrc.
Config dump API (Python core)
components/src/dynamo/common/config_dump/__init__.py, components/src/dynamo/common/config_dump/config_dumper.py
Add get_config_endpoint; refactor dump creation via _get_config_dump_data; adjust encoding, error handling, and logging; broaden _preprocess_for_encode typing.
Python frontends: expose dump_config endpoint
components/src/dynamo/sglang/main.py, components/src/dynamo/trtllm/main.py, components/src/dynamo/vllm/main.py
Import and serve get_config_endpoint on a dump_config endpoint alongside main endpoints using asyncio.gather; minor naming inconsistency (config vs dump_config) in one sglang path.
CLI flag plumbing (CLI → engine)
components/src/dynamo/frontend/main.py, lib/bindings/python/rust/llm/entrypoint.rs, lib/llm/src/local_model.rs
Add CLI flag --extremely-unsafe-do-not-use-in-prod-expose-dump-config; propagate via EntrypointArgs to LocalModelBuilder and store on LocalModel; default false with public getter/setter.
HTTP service: DRT + dump_config route
lib/llm/src/http/service/service_v2.rs, lib/llm/src/http/service.rs, lib/llm/src/http/service/dump_config.rs, lib/llm/src/entrypoint/input/http.rs
Add DistributedRuntime to service state/config and builder (with_drt), deprecate direct etcd wiring in favor of DRT; introduce opt-in dump_config router (path from env, guarded by flag); new handler aggregates per-instance configs by invoking instance dump_config endpoints; expose module dump_config; forward flag from local model into HTTP builder.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant HTTP as HTTP Service
  participant DRT as DistributedRuntime
  participant ETCD as Etcd (via DRT)
  participant Inst as Worker Instance(s)
  participant Py as get_config_endpoint

  User->>HTTP: GET /dump_config
  HTTP->>DRT: Query instances (endpoint="dump_config")
  DRT->>ETCD: List/Discover instances
  ETCD-->>DRT: Instance set
  alt Instances found
    loop For each instance
      HTTP->>Inst: Request dump_config
      Inst->>Py: get_config_endpoint(config)
      Py-->>Inst: JSON config or error
      Inst-->>HTTP: Response
    end
    HTTP-->>User: Aggregated JSON array (configs/errors)
  else None found
    HTTP-->>User: Message: no instances
  end
Loading
sequenceDiagram
  autonumber
  actor Admin as CLI User
  participant CLI as dynamo/frontend/main.py
  participant PyBind as EntrypointArgs (Py/Rust)
  participant LMB as LocalModelBuilder
  participant HTTPB as HttpServiceConfigBuilder
  participant HTTP as HTTP Service

  Admin->>CLI: --extremely-unsafe-do-not-use-in-prod-expose-dump-config
  CLI->>PyBind: EntrypointArgs{ flag: true }
  PyBind->>LMB: extremely_unsafe_do_not_use_in_prod_expose_dump_config(true)
  LMB-->>PyBind: LocalModel{ flag: true }
  PyBind->>HTTPB: .extremely_unsafe_do_not_use_in_prod_expose_dump_config(true)
  HTTPB->>HTTP: Build with dump_config route enabled
  HTTP-->>Admin: Service exposes /dump_config
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit taps configs with gentle thump,
New routes hop out: dump, dump, dump!
A flag so bold (unsafe, beware),
DRT guides the warren’s lair.
From workers’ burrows, truths now flow—
Carrots of JSON, in one neat row. 🥕

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.77% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary change by highlighting the addition of the /dump_config endpoint to the frontend, matching the core functionality introduced in the diff.
Description Check ✅ Passed The description includes all required template sections—Overview, Details, Where should the reviewer start, and Related Issues—and clearly outlines the purpose, scope, and entry points for review, with related issues referenced using action keywords.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/main.py (1)

348-376: Make endpoint name consistent with other workers

Every other worker registers the config dump handle under "dump_config", but here we call component.endpoint("config"). The frontend discovery loop filters on the "dump_config" endpoint (see get_config_handler_inner). This mismatch means multimodal workers never surface their config, breaking the cross-runtime dump. Please switch this endpoint name to "dump_config" for consistency.

🧹 Nitpick comments (3)
components/src/dynamo/common/config_dump/config_dumper.py (2)

80-88: Consider removing unused request parameter or document its purpose.

The request parameter is currently unused. If it's reserved for future endpoint signature compatibility, consider adding a comment or using _request to signal the intentional omission.

Apply this diff if the parameter serves no purpose:

-async def get_config_endpoint(config: Any, request=None):
+async def get_config_endpoint(config: Any):

Or document its purpose:

-async def get_config_endpoint(config: Any, request=None):
+async def get_config_endpoint(config: Any, request=None):  # request reserved for endpoint signature compatibility

82-83: Address TODO: dict serialization through endpoint.

The TODO comment indicates that returning the dict directly doesn't work correctly through the endpoint. This is a known limitation that forces JSON string serialization.

Would you like me to investigate the endpoint serialization behavior or open an issue to track this technical debt?

lib/llm/src/http/service/dump_config.rs (1)

207-207: Use appropriate log level.

This appears to be debug output rather than a warning condition.

-        tracing::warn!("message: {}", message_str);
+        tracing::debug!("Received config message: {}", message_str);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13bf5d9 and 547b046.

📒 Files selected for processing (13)
  • .gitignore (1 hunks)
  • components/src/dynamo/common/config_dump/__init__.py (2 hunks)
  • components/src/dynamo/common/config_dump/config_dumper.py (6 hunks)
  • components/src/dynamo/frontend/main.py (2 hunks)
  • components/src/dynamo/sglang/main.py (15 hunks)
  • components/src/dynamo/trtllm/main.py (5 hunks)
  • components/src/dynamo/vllm/main.py (6 hunks)
  • lib/bindings/python/rust/llm/entrypoint.rs (4 hunks)
  • lib/llm/src/entrypoint/input/http.rs (1 hunks)
  • lib/llm/src/http/service.rs (1 hunks)
  • lib/llm/src/http/service/dump_config.rs (1 hunks)
  • lib/llm/src/http/service/service_v2.rs (10 hunks)
  • lib/llm/src/local_model.rs (7 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
components/src/dynamo/common/config_dump/config_dumper.py

80-80: Unused function argument: request

(ARG001)


170-170: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: vllm (amd64)
🔇 Additional comments (17)
.gitignore (1)

106-108: Good call ignoring Direnv config.

Keeping .envrc out of version control prevents accidental leakage of local env overrides and plays nicely with direnv setups.

components/src/dynamo/common/config_dump/config_dumper.py (3)

114-119: Approved: Logging level adjustment.

Changing from info to debug for CONFIG_DUMP output appropriately reduces production log noise while preserving diagnostic capability.


129-180: LGTM: Clean separation of data assembly and encoding.

The extraction of _get_config_dump_data separates concerns well—data collection returns a dict, encoding happens at the boundary. Error handling preserves diagnostic data even on failure.


198-212: Approved: Dict passthrough support.

The early return for dict types (lines 205-206) is necessary for the refactored flow where _get_config_dump_data returns dicts before encoding. The updated return type annotation correctly reflects this.

lib/llm/src/http/service.rs (1)

24-24: LGTM: Module exposure.

The addition of the dump_config module to the public API surface is straightforward and aligns with the PR's objective to expose configuration dumping functionality.

components/src/dynamo/vllm/main.py (1)

177-177: Approved: Consistent dump_config endpoint pattern.

The implementation correctly:

  • Creates the dump_config endpoint in both prefill and decode initialization paths
  • Uses functools.partial(get_config_endpoint, config) to bind configuration
  • Serves concurrently with main endpoints via asyncio.gather
  • Applies consistent metrics labeling

This pattern is repeated across vllm, sglang, and trtllm components, ensuring uniform configuration dumping capability.

Also applies to: 234-234, 210-213, 324-327

components/src/dynamo/common/config_dump/__init__.py (1)

15-15: LGTM: Public API expansion.

Correctly exports get_config_endpoint for consumption by vllm, sglang, trtllm, and other components requiring configuration dump capabilities.

Also applies to: 29-29

lib/llm/src/entrypoint/input/http.rs (2)

54-57: Approved: Unsafe flag propagation with clear naming.

The method name extremely_unsafe_do_not_use_in_prod_expose_dump_config appropriately signals the security implications. Propagating this flag from local_model to http_service_builder enables runtime control over dump_config endpoint exposure.


64-64: Architectural shift: etcd to distributed runtime.

The change from with_etcd_client(etcd_client.clone()) to with_drt(Some(distributed_runtime.clone())) represents a broader architectural evolution noted in the PR summary. This enables the dump_config functionality while maintaining existing etcd-based watch flows.

lib/bindings/python/rust/llm/entrypoint.rs (1)

122-122: Approved: Complete Python-Rust binding plumbing.

The flag is correctly:

  • Added to EntrypointArgs struct (line 122)
  • Accepted as optional parameter in constructor (lines 129, 146)
  • Stored in the struct (line 172)
  • Propagated to LocalModelBuilder with safe default unwrap_or(false) (lines 207-209)

This ensures Python callers can optionally enable dump_config exposure while defaulting to the secure behavior.

Also applies to: 129-129, 146-146, 172-172, 206-210

components/src/dynamo/frontend/main.py (2)

213-217: Approved: Clear production warning in CLI.

The argument help text prominently warns users with "DO NOT USE IN PROD!" and explicitly lists the sensitive information exposed (config, environment variables, system info, GPU info, installed packages). This transparency is critical for informed decision-making.


282-285: Approved: Conditional flag propagation.

Only adding the kwarg when the flag is set maintains clean configuration and avoids polluting EntrypointArgs with unnecessary parameters.

components/src/dynamo/trtllm/main.py (1)

5-5: Approved: Consistent trtllm dump_config implementation.

The implementation mirrors the vllm pattern:

  • functools and get_config_endpoint imports (lines 5, 38)
  • config_to_dump dict assembling both engine and dynamo args (lines 286-288)
  • Endpoint served concurrently with asyncio.gather and functools.partial (lines 379-389, 392-399)
  • Branching for publish_events_and_metrics preserves existing conditional serving logic

This consistency across backends (vllm, sglang, trtllm) ensures uniform configuration dumping behavior.

Also applies to: 38-38, 286-288, 361-361, 379-399

lib/llm/src/http/service/dump_config.rs (2)

63-120: LGTM! Good resilient error handling.

The handler properly continues collecting configs even when individual instances fail, providing a complete view with error information for failed instances. The per-instance error handling ensures one failure doesn't break the entire response.


51-61: Require authentication/authorization for dump_config endpoint

Exposes full configuration (including credentials/internal endpoints). Ensure the globally composed router applies an auth middleware restricting access to admin/debug users, and sanitize any secrets/PII before returning.

lib/llm/src/http/service/service_v2.rs (2)

91-105: LGTM! Clean DistributedRuntime integration.

The state changes properly integrate DistributedRuntime while maintaining backward compatibility. Deriving the etcd_client from drt (line 92) ensures consistency and avoids duplicate client instances.


342-350: Good safety guardrails for debug endpoint.

The dump_config endpoint is properly gated behind an explicitly unsafe flag with a clear warning. The warning message effectively communicates the risk.

Note: Ensure this flag is never enabled in production deployments. Consider adding runtime environment checks (e.g., reject if NODE_ENV=production) as an additional safeguard.

Comment on lines +140 to +157
// TODO: this is very hacky and needs to be improved, should I be tracking all the endpoints as they come up?
// Wait for the client to discover instances from etcd
client
.wait_for_instances()
.await
.map_err(|e| format!("Failed to wait for instances: {}", e))?;

// Additional wait: Give the background monitor_instance_source task time to populate instance_avail
// The Client spawns a background task that updates instance_avail from instance_source,
// but it runs asynchronously. We need to ensure it has run at least once.
let max_retries = 50; // 50 * 10ms = 500ms max wait
for _ in 0..max_retries {
let avail_ids = client.instance_ids_avail();
if avail_ids.contains(&instance.instance_id) {
break;
}
tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Address the acknowledged "very hacky" implementation.

The TODO comment and polling loop indicate a known design issue. The current approach polls for instance availability with hardcoded retries (50 × 10ms = 500ms max), which is both brittle and inefficient.

Consider these improvements:

  1. Replace polling with proper event-driven instance discovery (e.g., watch for instance availability events)
  2. Make timeout/retry configuration explicit rather than hardcoded
  3. Document why this approach is necessary if a better solution isn't feasible yet
-    // TODO: this is very hacky and needs to be improved, should I be tracking all the endpoints as they come up?
-    // Wait for the client to discover instances from etcd
     client
         .wait_for_instances()
         .await
         .map_err(|e| format!("Failed to wait for instances: {}", e))?;

-    // Additional wait: Give the background monitor_instance_source task time to populate instance_avail
-    // The Client spawns a background task that updates instance_avail from instance_source,
-    // but it runs asynchronously. We need to ensure it has run at least once.
-    let max_retries = 50; // 50 * 10ms = 500ms max wait
-    for _ in 0..max_retries {
+    // Wait for instance to become available with configurable timeout
+    const POLL_INTERVAL_MS: u64 = 10;
+    const MAX_WAIT_MS: u64 = 500;
+    let max_retries = MAX_WAIT_MS / POLL_INTERVAL_MS;
+    
+    for retry in 0..max_retries {
         let avail_ids = client.instance_ids_avail();
         if avail_ids.contains(&instance.instance_id) {
+            tracing::debug!("Instance became available after {}ms", retry * POLL_INTERVAL_MS);
             break;
         }
-        tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
+        tokio::time::sleep(tokio::time::Duration::from_millis(POLL_INTERVAL_MS)).await;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TODO: this is very hacky and needs to be improved, should I be tracking all the endpoints as they come up?
// Wait for the client to discover instances from etcd
client
.wait_for_instances()
.await
.map_err(|e| format!("Failed to wait for instances: {}", e))?;
// Additional wait: Give the background monitor_instance_source task time to populate instance_avail
// The Client spawns a background task that updates instance_avail from instance_source,
// but it runs asynchronously. We need to ensure it has run at least once.
let max_retries = 50; // 50 * 10ms = 500ms max wait
for _ in 0..max_retries {
let avail_ids = client.instance_ids_avail();
if avail_ids.contains(&instance.instance_id) {
break;
}
tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
}
client
.wait_for_instances()
.await
.map_err(|e| format!("Failed to wait for instances: {}", e))?;
// Wait for instance to become available with configurable timeout
const POLL_INTERVAL_MS: u64 = 10;
const MAX_WAIT_MS: u64 = 500;
let max_retries = MAX_WAIT_MS / POLL_INTERVAL_MS;
for retry in 0..max_retries {
let avail_ids = client.instance_ids_avail();
if avail_ids.contains(&instance.instance_id) {
tracing::debug!(
"Instance became available after {}ms",
retry * POLL_INTERVAL_MS
);
break;
}
tokio::time::sleep(tokio::time::Duration::from_millis(POLL_INTERVAL_MS)).await;
}
🤖 Prompt for AI Agents
lib/llm/src/http/service/dump_config.rs lines 140-157: the current hardcoded
polling loop that waits up to 50×10ms for client.instance_ids_avail() is brittle
and flagged as "very hacky" — replace it by adding or using an event-driven wait
on the client (e.g., implement/consume a client.wait_for_instance(instance_id,
timeout) that awaits a notification/watch from the instance source), remove the
ad-hoc loop, make the timeout/backoff configurable via function params or config
values (do not hardcode 50/10ms), and add a short comment documenting why the
wait is needed and what the chosen timeout semantics are if an immediate
event-driven solution isn’t possible.

Comment on lines +196 to +213
// I'm not sure why I can't nest more than one level, but when
// I do, it passes through weird json
let message = response
.get_mut("data")
.map(|v| v.take())
.ok_or_else(|| format!("No data field in response {:?}", response))?
.get_mut("message")
.map(|v| v.take())
.ok_or_else(|| format!("No message field in response {:?}", response))?;
if let Some(message_str) = message.as_str() {
// The message is itself a json string
tracing::warn!("message: {}", message_str);
let message_json = serde_json::from_str(message_str)
.map_err(|e| format!("Failed to parse message as json: {}", e))?;
Ok(message_json)
} else {
Err(format!("message field is not a string {:?}", response))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Simplify the nested JSON extraction protocol.

The response structure requires double-parsing: extracting a JSON string from response.data.message and then parsing that string as JSON. This suggests a protocol design issue.

The nested structure should be simplified:

  • Current: {"data": {"message": "{\"config\": ...}"}} (JSON string inside JSON)
  • Better: {"data": {"config": {...}}} (direct JSON object)

If the nested structure is unavoidable due to existing protocol constraints:

  1. Add documentation explaining why this structure is necessary
  2. Consider creating a helper function to encapsulate this extraction logic
  3. Improve error messages to help debug protocol mismatches

Example helper:

fn extract_config_from_response(mut response: serde_json::Value) -> Result<Value, String> {
    let message = response
        .get_mut("data")
        .and_then(|v| v.get_mut("message"))
        .and_then(|v| v.take())
        .ok_or_else(|| format!("Missing data.message field in response: {:?}", response))?;
    
    let message_str = message
        .as_str()
        .ok_or_else(|| format!("message field is not a string: {:?}", message))?;
    
    serde_json::from_str(message_str)
        .map_err(|e| format!("Failed to parse message as JSON: {}", e))
}
🤖 Prompt for AI Agents
In lib/llm/src/http/service/dump_config.rs around lines 196 to 213, the code
double-parses a JSON string held in response.data.message which is brittle and
hard to follow; replace the inline nested extraction with a small helper
function (e.g., extract_config_from_response) that takes ownership or a mutable
response Value, safely navigates to data.message using chained and_then/get_mut
or get/and_then, validates that message is a string with clear error messages,
parses that string with serde_json::from_str and returns Result<Value, String>;
add a short doc comment above the helper explaining why the nested string format
exists (or note that it should be flattened in protocol), and update the caller
to use the helper so the extraction logic and improved error messages are
centralized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant