Skip to content

Conversation

ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Oct 3, 2025

see #3275

Overview

Introduced dynamic HTTP endpoints and a Native API handler to enhance the component's functionality and API flexibility.

Changes Made

  • Added NativeApiHandler class for managing SGLang native API endpoints.
  • Integrated NativeApiHandler initialization and task management into the main component setup.
  • Extended the serve_endpoint function to support dynamic http_endpoint_path configuration.
  • Introduced a new module, dynamic_endpoint.rs, for handling dynamically configured HTTP endpoints.
  • Updated Instance struct to include HTTP endpoint paths.

Examples

  1. Curling the new dynamic get_model_info
❯ curl -X POST http://localhost:8000/get_model_info
{"responses":[{"data":{"data":[{"model_path":"Qwen/Qwen3-0.6B","tokenizer_path":"Qwen/Qwen3-0.6B","preferred_sampling_params":null,"weight_version":"default"}]}}]}
  1. Curling a non-existant endpoint (404)
❯ curl -s -w "%{http_code}\n" -X POST http://localhost:8000/graham_rust_legend
{"message":"Endpoint not found"}404

Follow up

  • Implement more endpoints and handle p/d specific implementations as well
  • Add docs for this

@ishandhanani ishandhanani requested review from a team as code owners October 3, 2025 22:16
Copy link

copy-pr-bot bot commented Oct 3, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Oct 3, 2025
@ishandhanani ishandhanani marked this pull request as draft October 3, 2025 22:16
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

Walkthrough

Adds a new "test" endpoint in sglang main to run alongside the primary endpoint, registers a custom route "/test", and serves it concurrently. Introduces Endpoint.register_custom_endpoint in Rust/Python bindings with path validation and optional etcd registration. Updates two imports for chat_templates to a new module path.

Changes

Cohort / File(s) Change summary
Test endpoint orchestration
components/src/dynamo/sglang/main.py
Creates a secondary endpoint named "test", registers the custom route "/test", and serves it concurrently with the main endpoint using the same handler and readiness behavior.
Import path update for chat templates
components/src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py, components/src/dynamo/sglang/utils/multimodal_chat_processor.py
Switches imports from sglang.srt.conversation to sglang.srt.parser.conversation; no logic changes.
Endpoint API: custom endpoint registration (Rust + Python bindings)
lib/bindings/python/rust/lib.rs, lib/bindings/python/src/dynamo/_core.pyi
Adds Endpoint::register_custom_endpoint(endpoint_path) with path validation and optional etcd key creation; exposes it to Python in the type stubs.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant App as sglang main.py
    participant EP as Endpoint (Python)
    participant Rust as Endpoint (Rust)
    participant Etcd as etcd (optional)
    participant Srv as Server

    App->>EP: component.endpoint("default")
    App->>EP: component.endpoint("test")
    note right of App: Startup

    App->>EP: test_endpoint.register_custom_endpoint("/test")
    EP->>Rust: register_custom_endpoint("/test")
    alt Valid path
        Rust->>Etcd: Put key for "/test" (if client available)
        Etcd-->>Rust: Ack
        Rust-->>EP: Ok
    else Invalid path
        Rust-->>EP: Error (PyValueError)
    end

    par Serve endpoints
        App->>Srv: serve_endpoint(default, handler.generate)
        App->>Srv: serve_endpoint(test, handler.generate)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hopped to the edge of the routing glade,
A "/test" trail blazed where none was laid.
Etcd whispered keys in the breeze,
Endpoints twinned like mirrored trees.
With tidy imports trimmed just right—
I thump approval, ears alight. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description provides helpful background and examples but does not adhere to the repository’s required template structure or heading levels. It uses “## Overview” and “## Changes Made” instead of the expected “#### Overview:” and “#### Details:”, and it omits the “#### Where should the reviewer start?” section. The “Related Issues” section is also not formatted with the proper heading and action keyword as specified in the template. Please update the description to match the template by using the exact section headings (“#### Overview:”, “#### Details:”, “#### Where should the reviewer start?”, “#### Related Issues:”) and include the linked issue with an appropriate action keyword (e.g., “Closes #3275”).
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title “feat: dynamic endpoint registration” accurately captures the primary change of adding support for registering custom endpoints dynamically and is concise and specific to the main feature introduced in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c48f49a and b317939.

📒 Files selected for processing (5)
  • components/src/dynamo/sglang/main.py (2 hunks)
  • components/src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py (1 hunks)
  • components/src/dynamo/sglang/utils/multimodal_chat_processor.py (1 hunks)
  • lib/bindings/python/rust/lib.rs (1 hunks)
  • lib/bindings/python/src/dynamo/_core.pyi (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
PR: ai-dynamo/dynamo#2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.

Applied to files:

  • lib/bindings/python/rust/lib.rs
🧬 Code graph analysis (3)
lib/bindings/python/src/dynamo/_core.pyi (1)
lib/bindings/python/rust/lib.rs (1)
  • register_custom_endpoint (648-678)
components/src/dynamo/sglang/main.py (2)
lib/bindings/python/rust/lib.rs (4)
  • component (763-769)
  • endpoint (628-634)
  • register_custom_endpoint (648-678)
  • serve_endpoint (681-734)
lib/bindings/python/src/dynamo/_core.pyi (4)
  • component (86-90)
  • endpoint (105-109)
  • register_custom_endpoint (118-122)
  • serve_endpoint (124-136)
lib/bindings/python/rust/lib.rs (4)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • register_custom_endpoint (118-122)
lib/runtime/src/component.rs (3)
  • drt (177-179)
  • drt (397-399)
  • drt (548-550)
lib/runtime/src/distributed.rs (1)
  • etcd_client (269-271)
lib/bindings/python/rust/llm/entrypoint.rs (1)
  • to_pyerr (286-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (.)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: clippy (launch/dynamo-run)
🔇 Additional comments (3)
components/src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py (1)

8-8: LGTM: Consistent import path update.

This import path change is consistent with the update in multimodal_chat_processor.py (line 6), suggesting a coordinated refactoring of the sglang module structure.

components/src/dynamo/sglang/utils/multimodal_chat_processor.py (1)

6-6: Import path change validated: no remaining imports of sglang.srt.conversation, and the new path aligns with existing usage.

lib/bindings/python/rust/lib.rs (1)

647-678: Document custom etcd key usage and tighten path validation

  • Clarify why raw endpoint_path (e.g. /foo) is written directly to etcd instead of the dyn:// scheme (is this only for HTTP routing/discovery?).
  • Strengthen validation: reject a lone /, consecutive slashes (e.g. //foo), and disallow whitespace or control characters.
  • Consider logging or warning when etcd_client is None to indicate registration was skipped.
  • Wrap the kv_create error to include the endpoint_path in the Python exception for better debugging.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 3, 2025
@pull-request-size pull-request-size bot added size/M and removed size/L labels Oct 3, 2025
@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 3, 2025
@ai-dynamo ai-dynamo deleted a comment from coderabbitai bot Oct 4, 2025
Copy link
Contributor

@Aphoh Aphoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick refactor and it should look cleaner. Didn't deeply check the functionality.

@ishandhanani
Copy link
Contributor Author

/ok to test 4502810

@ishandhanani
Copy link
Contributor Author

Rust checks will stop being mad after #3458 goes into main

Copy link
Contributor

@Aphoh Aphoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small nitpicks

Copy link
Contributor

@Aphoh Aphoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there: change a panic to a Result::Err and remove an extra dependency

@grahamking grahamking self-requested a review October 7, 2025 18:24
/// queries each one, and returns `{"responses": [instance1_result, instance2_result, ...]}`.
///
/// Returns 404 if no instances have registered the endpoint.
async fn inner_dynamic_endpoint_handler(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general design question - correct me if I'm understanding the changes properly or not

Currently, this change more or less lets you call an arbitrary route curl -X POST localhost:8000/my_custom_route.

  • If the route is something the exists from another handler (ex: /health, /v1/models, etc.), presumably that specific handler will be invoked instead? Or will it also match this one?
  • If the route doesn't match any of the other handlers, it will hit this one. At that point in time, per request, we query ETCD to see if any endpoint instances have set an http_endpoint_path key, and if so we will try to route the request to all of those matched instances round robin

This is my understanding of the current changes.

Assuming it's roughly accurate - my next question is why not use something like the watcher pattern we have elsewhere for discovery, that will be watching in background for specific keys (like /http_endpoint_path), and maybe dynamically adds/removes routes to the http server and associates corresponding endpoints/instances in some map for them there, rather than on a per request basis here?

I'm a little hesitant about the additional per-request checking here rather than something more discovery-oriented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a blocking comment yet, just looking to understand the approach here better, and get more context on use case, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with Ryan here. I would rather attach a route for the endpoint when it appears, instead of attaching a wildcard route and hitting etcd on every request.

Copy link
Contributor Author

@ishandhanani ishandhanani Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the route is something the exists from another handler (ex: /health, /v1/models, etc.), presumably that specific handler will be invoked instead? Or will it also match this one?

I believe it will go to the specific handler.

my next question is why not use something like the watcher pattern we have elsewhere for discovery
associates corresponding endpoints/instances in some map for them there, rather than on a per request basis here?

This is actually how I implemented it before. But I feel like it made things a little bit messy. Why have a separate map when it can all just be under a single endpoint entry? If the endpoint goes down, this will also take the etcd entry down which will also take the http_endpoint section as well.

hitting etcd on every request.

Endpoints that are implemented in this fashion are not meant to be endpoints where we serve heavy traffic by any means. Checking etcd here doesn't seem like it costs much. Why have another watcher?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoints that are implemented in this fashion are not meant to be endpoints where we serve heavy traffic by any means. Checking etcd here doesn't seem like it costs much. Why have another watcher?

I think right now it's a "hey you can expose any endpoint here" feature - so I won't be surprised at all if someone tries calling some native endpoint for some other use case that maybe dynamo doesn't natively support yet but the framework does as a stopgap solution until we do support it. And if so, then we lose this assumption right?

I think I'm less worried optimiznig for the extreme heavy load case on one of these custom endpoints (in terms of expecting it to happen) and moreso just general code smell of doing unnecessary work and checking something on every request if we can instead only act to do the bare minimum when necessary (on discovery). At the end of the day we have limited resources (threads, CPUs, etc.) and the less we use them, the more resources the heavy load endpoints (chat, completions, nats, etcd, etc.) have to work freely with and less we have to worry about later.

For example, if any of these native endpoints are things that may not get heavy load, but may get polled say every second or every few seconds, that could be non trivial at some point. Though this implementation is completely custom support for anything, so I can't really guess what all it would be used for.

This is actually how I implemented it before. But I feel like it made things a little bit messy.

Do you have a draft/commit to refer to the original solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not complete but check out b317939

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to Ryan offline. I think my approach before stemmed from a lack of understanding the watcher pattern and how our ModelWatcher worked.

Refactored in latest commit

@ishandhanani ishandhanani requested a review from rmccorm4 October 7, 2025 21:44
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants