Skip to content

Conversation

@YouNeedCryDear
Copy link
Contributor

@YouNeedCryDear YouNeedCryDear commented Sep 29, 2025

This PR introduces advanced routing capabilities to the SGLang router with two new policies and a unified routing context system that enables header-based worker selection.

Motivation

Modifications

Key highlights:

  1. Two new routing policies:
    - RuleBasedPolicy - ordering by priority → cost → load with header filtering
    - LoadAwarePolicy - Simple minimum-load worker selection
  2. RoutingContext system:
    - Unified context bundling headers, request_text, and model_id
    - Enables header-aware routing decisions
  3. Router-level enhancements:
    - get_router_stats() method for multi-router scenarios
    - Enhanced select_router_for_request() using worker statistics
    - Priority, cost, and load-based router selection
  4. Request headers:
    - x-worker-priority - Minimum priority threshold
    - x-max-cost - Maximum cost threshold
    - x-prefer-pd - Prefer PD routers

Breaking changes:
// Before
fn select_worker(&self, workers: &[Arc], request_text: Option<&str>) -> Option;

// After
fn select_worker(&self, workers: &[Arc], context: &RoutingContext) -> Option;

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @YouNeedCryDear, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the routing capabilities of the sglang-router by introducing an adaptive routing engine. This engine dynamically chooses between a fast, load-based routing approach for standard requests and a sophisticated, rule-based strategy for requests with advanced headers. This enhancement provides greater flexibility and control over worker selection, optimizing resource utilization and request handling based on specific criteria like worker priority, cost, and PD preference.

Highlights

  • New Routing Strategies: Introduced three new routing strategies: LoadBasedRoutingStrategy for simple load-based routing, RuleBasedRoutingStrategy for complex header-based routing, and AdaptiveRoutingEngine to intelligently select between the two based on request headers.
  • Enhanced RouterTrait: The RouterTrait now includes get_candidate_workers and get_load_stats methods, enabling routers to expose worker information crucial for advanced routing decisions.
  • Dynamic Router Selection: The RouterManager has been refactored to utilize the new AdaptiveRoutingEngine, allowing for dynamic and intelligent selection of the optimal router based on request characteristics and advanced headers like x-worker-priority and x-max-cost.
  • Robust Error Handling: New RouterError types have been added to provide more specific error reporting for various routing failures, and lock poisoning scenarios are handled more gracefully.
  • Comprehensive Testing: New unit and integration test files (advanced_routing_tests.rs) have been added to thoroughly validate the functionality of the new routing strategies, configurations, and worker scoring logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sophisticated adaptive routing mechanism with distinct strategies for load-based and rule-based routing, which is a significant enhancement for flexibility and performance. The implementation is well-structured, with a clear separation of concerns. The addition of comprehensive tests is also commendable. My review focuses on improving robustness, correctness in edge cases, and consistency across the new routing logic and tests.

Copy link
Collaborator

@slin1237 slin1237 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not entire following the design
can you write the description in the pr

@slin1237 slin1237 self-assigned this Sep 29, 2025
@YouNeedCryDear YouNeedCryDear force-pushed the router/priority-routing branch 2 times, most recently from 7dcb5a5 to 1698c57 Compare October 2, 2025 23:25
use advanced routing logic in router manager

use get candidate workers in both regular and PD router implementation

add unit test for advanced routing

add integration test for advanced routing

Revert "use advanced routing logic in router manager"

This reverts commit 03d58a8a827b74aa62574c7ac0fd73e01b5468d3.

revert all router-to-router strategy

remove router strategy test
add rule based routing policy

update reference for rule based and load aware policy

fix test case

remove integration test

add routingcontext including header request and change input for other policies

use context in load aware and rule based policy

add get router stats to router trait

update router selection logic based on router stats

add header to worker selection and get router stats function

format
@YouNeedCryDear YouNeedCryDear force-pushed the router/priority-routing branch from 1698c57 to 13752ac Compare October 2, 2025 23:37
@YouNeedCryDear
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new routing policies, LoadAwarePolicy and RuleBasedPolicy, and a RoutingContext system to enable more advanced, adaptive routing strategies based on worker load and request headers. The changes are well-structured and include comprehensive tests for the new policies. I've provided a few suggestions to improve code idiom and reduce duplication, primarily focusing on using Rust's iterator patterns more effectively for better performance and maintainability. Overall, this is a solid enhancement to the router's capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants