-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Adaptive Routing with Worker Load Based Strategy and Rule Based Strategy #11058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adaptive Routing with Worker Load Based Strategy and Rule Based Strategy #11058
Conversation
Summary of ChangesHello @YouNeedCryDear, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the routing capabilities of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a sophisticated adaptive routing mechanism with distinct strategies for load-based and rule-based routing, which is a significant enhancement for flexibility and performance. The implementation is well-structured, with a clear separation of concerns. The addition of comprehensive tests is also commendable. My review focuses on improving robustness, correctness in edge cases, and consistency across the new routing logic and tests.
slin1237
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not entire following the design
can you write the description in the pr
7dcb5a5 to
1698c57
Compare
use advanced routing logic in router manager use get candidate workers in both regular and PD router implementation add unit test for advanced routing add integration test for advanced routing Revert "use advanced routing logic in router manager" This reverts commit 03d58a8a827b74aa62574c7ac0fd73e01b5468d3. revert all router-to-router strategy remove router strategy test
add rule based routing policy update reference for rule based and load aware policy fix test case remove integration test add routingcontext including header request and change input for other policies use context in load aware and rule based policy add get router stats to router trait update router selection logic based on router stats add header to worker selection and get router stats function format
1698c57 to
13752ac
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two new routing policies, LoadAwarePolicy and RuleBasedPolicy, and a RoutingContext system to enable more advanced, adaptive routing strategies based on worker load and request headers. The changes are well-structured and include comprehensive tests for the new policies. I've provided a few suggestions to improve code idiom and reduce duplication, primarily focusing on using Rust's iterator patterns more effectively for better performance and maintainability. Overall, this is a solid enhancement to the router's capabilities.
This PR introduces advanced routing capabilities to the SGLang router with two new policies and a unified routing context system that enables header-based worker selection.
Motivation
Modifications
Key highlights:
- RuleBasedPolicy - ordering by priority → cost → load with header filtering
- LoadAwarePolicy - Simple minimum-load worker selection
- Unified context bundling headers, request_text, and model_id
- Enables header-aware routing decisions
- get_router_stats() method for multi-router scenarios
- Enhanced select_router_for_request() using worker statistics
- Priority, cost, and load-based router selection
- x-worker-priority - Minimum priority threshold
- x-max-cost - Maximum cost threshold
- x-prefer-pd - Prefer PD routers
Breaking changes:
// Before
fn select_worker(&self, workers: &[Arc], request_text: Option<&str>) -> Option;
// After
fn select_worker(&self, workers: &[Arc], context: &RoutingContext) -> Option;
Accuracy Tests
Benchmarking and Profiling
Checklist