-
Notifications
You must be signed in to change notification settings - Fork 631
feat: dynamic endpoint registration #3418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new "test" endpoint in sglang main to run alongside the primary endpoint, registers a custom route "/test", and serves it concurrently. Introduces Endpoint.register_custom_endpoint in Rust/Python bindings with path validation and optional etcd registration. Updates two imports for chat_templates to a new module path. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as sglang main.py
participant EP as Endpoint (Python)
participant Rust as Endpoint (Rust)
participant Etcd as etcd (optional)
participant Srv as Server
App->>EP: component.endpoint("default")
App->>EP: component.endpoint("test")
note right of App: Startup
App->>EP: test_endpoint.register_custom_endpoint("/test")
EP->>Rust: register_custom_endpoint("/test")
alt Valid path
Rust->>Etcd: Put key for "/test" (if client available)
Etcd-->>Rust: Ack
Rust-->>EP: Ok
else Invalid path
Rust-->>EP: Error (PyValueError)
end
par Serve endpoints
App->>Srv: serve_endpoint(default, handler.generate)
App->>Srv: serve_endpoint(test, handler.generate)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
components/src/dynamo/sglang/main.py
(2 hunks)components/src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py
(1 hunks)components/src/dynamo/sglang/utils/multimodal_chat_processor.py
(1 hunks)lib/bindings/python/rust/lib.rs
(1 hunks)lib/bindings/python/src/dynamo/_core.pyi
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
PR: ai-dynamo/dynamo#2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.
Applied to files:
lib/bindings/python/rust/lib.rs
🧬 Code graph analysis (3)
lib/bindings/python/src/dynamo/_core.pyi (1)
lib/bindings/python/rust/lib.rs (1)
register_custom_endpoint
(648-678)
components/src/dynamo/sglang/main.py (2)
lib/bindings/python/rust/lib.rs (4)
component
(763-769)endpoint
(628-634)register_custom_endpoint
(648-678)serve_endpoint
(681-734)lib/bindings/python/src/dynamo/_core.pyi (4)
component
(86-90)endpoint
(105-109)register_custom_endpoint
(118-122)serve_endpoint
(124-136)
lib/bindings/python/rust/lib.rs (4)
lib/bindings/python/src/dynamo/_core.pyi (1)
register_custom_endpoint
(118-122)lib/runtime/src/component.rs (3)
drt
(177-179)drt
(397-399)drt
(548-550)lib/runtime/src/distributed.rs (1)
etcd_client
(269-271)lib/bindings/python/rust/llm/entrypoint.rs (1)
to_pyerr
(286-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (.)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: clippy (launch/dynamo-run)
🔇 Additional comments (3)
components/src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py (1)
8-8
: LGTM: Consistent import path update.This import path change is consistent with the update in
multimodal_chat_processor.py
(line 6), suggesting a coordinated refactoring of the sglang module structure.components/src/dynamo/sglang/utils/multimodal_chat_processor.py (1)
6-6
: Import path change validated: no remaining imports ofsglang.srt.conversation
, and the new path aligns with existing usage.lib/bindings/python/rust/lib.rs (1)
647-678
: Document custom etcd key usage and tighten path validation
- Clarify why raw
endpoint_path
(e.g./foo
) is written directly to etcd instead of thedyn://
scheme (is this only for HTTP routing/discovery?).- Strengthen validation: reject a lone
/
, consecutive slashes (e.g.//foo
), and disallow whitespace or control characters.- Consider logging or warning when
etcd_client
isNone
to indicate registration was skipped.- Wrap the
kv_create
error to include theendpoint_path
in the Python exception for better debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick refactor and it should look cleaner. Didn't deeply check the functionality.
/ok to test 4502810 |
Rust checks will stop being mad after #3458 goes into main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small nitpicks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there: change a panic to a Result::Err and remove an extra dependency
/// queries each one, and returns `{"responses": [instance1_result, instance2_result, ...]}`. | ||
/// | ||
/// Returns 404 if no instances have registered the endpoint. | ||
async fn inner_dynamic_endpoint_handler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general design question - correct me if I'm understanding the changes properly or not
Currently, this change more or less lets you call an arbitrary route curl -X POST localhost:8000/my_custom_route
.
- If the route is something the exists from another handler (ex:
/health
,/v1/models
, etc.), presumably that specific handler will be invoked instead? Or will it also match this one? - If the route doesn't match any of the other handlers, it will hit this one. At that point in time, per request, we query ETCD to see if any endpoint instances have set an
http_endpoint_path
key, and if so we will try to route the request to all of those matched instances round robin
This is my understanding of the current changes.
Assuming it's roughly accurate - my next question is why not use something like the watcher
pattern we have elsewhere for discovery, that will be watching in background for specific keys (like /http_endpoint_path
), and maybe dynamically adds/removes routes to the http server and associates corresponding endpoints/instances in some map for them there, rather than on a per request basis here?
I'm a little hesitant about the additional per-request checking here rather than something more discovery-oriented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocking comment yet, just looking to understand the approach here better, and get more context on use case, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with Ryan here. I would rather attach a route for the endpoint when it appears, instead of attaching a wildcard route and hitting etcd on every request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the route is something the exists from another handler (ex: /health, /v1/models, etc.), presumably that specific handler will be invoked instead? Or will it also match this one?
I believe it will go to the specific handler.
my next question is why not use something like the watcher pattern we have elsewhere for discovery
associates corresponding endpoints/instances in some map for them there, rather than on a per request basis here?
This is actually how I implemented it before. But I feel like it made things a little bit messy. Why have a separate map when it can all just be under a single endpoint entry? If the endpoint goes down, this will also take the etcd entry down which will also take the http_endpoint
section as well.
hitting etcd on every request.
Endpoints that are implemented in this fashion are not meant to be endpoints where we serve heavy traffic by any means. Checking etcd here doesn't seem like it costs much. Why have another watcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoints that are implemented in this fashion are not meant to be endpoints where we serve heavy traffic by any means. Checking etcd here doesn't seem like it costs much. Why have another watcher?
I think right now it's a "hey you can expose any endpoint here" feature - so I won't be surprised at all if someone tries calling some native endpoint for some other use case that maybe dynamo doesn't natively support yet but the framework does as a stopgap solution until we do support it. And if so, then we lose this assumption right?
I think I'm less worried optimiznig for the extreme heavy load case on one of these custom endpoints (in terms of expecting it to happen) and moreso just general code smell of doing unnecessary work and checking something on every request if we can instead only act to do the bare minimum when necessary (on discovery). At the end of the day we have limited resources (threads, CPUs, etc.) and the less we use them, the more resources the heavy load endpoints (chat, completions, nats, etcd, etc.) have to work freely with and less we have to worry about later.
For example, if any of these native endpoints are things that may not get heavy load, but may get polled say every second or every few seconds, that could be non trivial at some point. Though this implementation is completely custom support for anything, so I can't really guess what all it would be used for.
This is actually how I implemented it before. But I feel like it made things a little bit messy.
Do you have a draft/commit to refer to the original solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not complete but check out b317939
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke to Ryan offline. I think my approach before stemmed from a lack of understanding the watcher pattern and how our ModelWatcher worked.
Refactored in latest commit
see #3275
Overview
Introduced dynamic HTTP endpoints and a Native API handler to enhance the component's functionality and API flexibility.
Changes Made
NativeApiHandler
class for managing SGLang native API endpoints.NativeApiHandler
initialization and task management into the main component setup.serve_endpoint
function to support dynamichttp_endpoint_path
configuration.dynamic_endpoint.rs
, for handling dynamically configured HTTP endpoints.Instance
struct to include HTTP endpoint paths.Examples
get_model_info
Follow up