Add vLLM Semantic Router (vllm-sr) integration by carlory · Pull Request #61 · RouteWorks/RouterArena

carlory · 2026-01-22T10:11:24Z

Summary

This PR adds support for the vLLM Semantic Router (vllm-sr) to RouterArena, along with significant infrastructure improvements for batch evaluation and parallel processing:

New Router: Implements VLLMSR router that classifies queries by category and routes to appropriate models based on semantic intent
Router Integration: Updates prediction file generation to support dynamic router instantiation via router_name and router_cls_name
Documentation: Updates README links from GitHub to Code references

Key Changes

vLLM-SR Router Implementation (router_inference/router/vllm_sr.py)
- Calls vllm-project/semantic-router classification API
- Maps query categories to optimal models
- Configurable category-to-model mapping
Results & Configuration
- Added vllm-sr configuration (router_inference/config/vllm-sr.json)
- Included prediction files for vllm-sr and vllm-sr-robustness

Close vllm-project/semantic-router#1005

🤖 Generated with Claude Code

yl231 · 2026-01-22T19:36:59Z

@carlory Thanks for the PR. There is a small error in the workflow, and I am looking into that.

yl231 · 2026-01-22T21:04:20Z

@carlory Hi, thank you for the submission again. The workflow failed because of two reasons.

the router inference has only 809 entries whereas the complete benchmark has 8400 entries.
You could run this command to acquire the complete prediction file
uv run python ./router_inference/check_config_prediction_files.py your-router full
Another issue is that the llm inference is not performed, so the generated_answer field is empty. If you want to use the LLMs that pipeline doesn't support, you could send me the model list and I could add these llm inference entrypoints to the pipeline.

Thank you!
Yifan

carlory · 2026-01-23T03:19:25Z

@yl231 What's the relationship between cost.json and model_cost.json files in the model_costs directory? They are both related to tracking the costs associated with different models. vllm-sr uses gpt-4o-mini in our model pool, but the cost information for gpt-4o-mini is not stored in model_cost.json. I can find the cost information for gpt-4o-mini in cost.json. Should I update the model_cost.json to include the cost information for gpt-4o-mini?

carlory · 2026-01-23T03:22:01Z

If you want to use the LLMs that pipeline doesn't support, you could send me the model list and I could add these llm inference entrypoints to the pipeline.

Thank you. Let me test it with all the existing models in the pipeline first. The uploaded file is outdated. I will regenerate it.

yl231 · 2026-01-23T03:41:40Z

@yl231 What's the relationship between cost.json and model_cost.json files in the model_costs directory? They are both related to tracking the costs associated with different models. vllm-sr uses gpt-4o-mini in our model pool, but the cost information for gpt-4o-mini is not stored in model_cost.json. I can find the cost information for gpt-4o-mini in cost.json. Should I update the model_cost.json to include the cost information for gpt-4o-mini?

cost.json is a legacy file, and it should be replaced with model_cost.json. I will update the repo to reflect this. Sorry about the confusion!

carlory · 2026-01-23T06:17:15Z

You could run this command to acquire the complete prediction file
uv run python ./router_inference/check_config_prediction_files.py your-router full

cached results:

claude-3-haiku-20240307: 809 items
gemini-2.0-flash-001: 809 items
gpt-4o-mini: 8400 items

@yl231 Cloud you provide full cached results for other models if possible? I don't have access to them right now.

…ame and router_cls_name from pipeline_params Signed-off-by: carlory <[email protected]>

Signed-off-by: carlory <[email protected]>

Add trailing comma and reformat long return statement to meet formatting standards. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

yl231 · 2026-01-23T06:54:40Z

You could run this command to acquire the complete prediction file
uv run python ./router_inference/check_config_prediction_files.py your-router full

cached results:

claude-3-haiku-20240307: 809 items

gemini-2.0-flash-001: 809 items

gpt-4o-mini: 8400 items

@yl231 Cloud you provide full cached results for other models if possible? I don't have access to them right now.

Yes, I could do so! Could you give me a list of what you want?

carlory · 2026-01-23T06:59:42Z

Could you give me a list of what you want?

model list: claude-3-haiku-20240307 and gemini-2.0-flash-001.

…full Signed-off-by: carlory <[email protected]>

carlory · 2026-01-28T05:51:47Z

Yes, I could do so! Could you give me a list of what you want?

@yl231 I regenerate the router_inference/predictions/vllm-sr.json with full dataset.

github-actions · 2026-01-30T19:26:29Z

Router Evaluation Results

Router: vllm-sr
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.6723
Accuracy	66.53%
Total Cost	$0.537853
Avg Cost per Query	$0.000064
Avg Cost per 1K Queries	$0.0640
Number of Queries	8400
Robustness Score	0.9095

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.9410
Opt.Cost (Cost Efficiency)	0.9012
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

yl231 · 2026-01-30T19:45:05Z

@carlory I have run the inference and evaluation for vLLM-SR, and you rank in first place now! Congratulations!

yl231

Thank you for your submission!

yl231 · 2026-01-30T20:46:53Z

@carlory I have run the inference and evaluation for vLLM-SR, and you rank in first place now! Congratulations!

Do you want this result be posted on the leaderboard? If so, I will update the README.md to reflect it.

Thanks again!

carlory · 2026-02-02T02:32:53Z

Cool. Thank you! Please update the README file. @yl231

Xunzhuo · 2026-02-02T03:21:33Z

thanks @carlory nice work! i think we can investigate more signal using in the arena

Xunzhuo · 2026-02-02T03:31:07Z

btw we recently re-trained long context multilingual bert, just got into the main, which may largely improve our accuracy as well. we can continuously update the scores in follow up

carlory · 2026-02-02T03:45:43Z

Selected Models (used in this PR):

drwxr-xr-x  14 kiki  staff   448B Jan 28 12:04 mmbert-intent-classifier-lora
drwxr-xr-x  13 kiki  staff   416B Jan 28 11:47 mmbert-intent-classifier-merged
drwxr-xr-x  14 kiki  staff   448B Jan 23 11:04 mom-domain-classifier
drwxr-xr-x  20 kiki  staff   640B Jan 23 11:04 mom-embedding-flash
drwxr-xr-x  21 kiki  staff   672B Jan 23 11:04 mom-embedding-light
drwxr-xr-x  15 kiki  staff   480B Jan 23 11:04 mom-embedding-pro
drwxr-xr-x  11 kiki  staff   352B Jan 23 11:04 mom-feedback-detector
drwxr-xr-x  10 kiki  staff   320B Jan 23 11:04 mom-halugate-detector
drwxr-xr-x  10 kiki  staff   320B Jan 23 11:04 mom-halugate-explainer
drwxr-xr-x  13 kiki  staff   416B Jan 23 11:04 mom-halugate-sentinel
drwxr-xr-x  14 kiki  staff   448B Jan 23 11:04 mom-jailbreak-classifier
drwxr-xr-x  14 kiki  staff   448B Jan 23 11:04 mom-pii-classifier

I removed some models because they need more memory than my laptop can provide.

btw we recently re-trained long context multilingual bert, just got into the main, which may largely improve our accuracy as well. w

Ok, I can do it in a follow-up PR.

Xunzhuo · 2026-02-02T03:50:25Z

FYI vllm-project/semantic-router#1227

Xunzhuo

Thanks

carlory changed the title ~~Add vLLM Semantic Router (vllm-sr) integration and batch evaluation improvements~~ Add vLLM Semantic Router (vllm-sr) integration Jan 22, 2026

carlory force-pushed the vllm-sr branch from b099905 to 4af6e9b Compare January 22, 2026 10:23

carlory changed the title ~~Add vLLM Semantic Router (vllm-sr) integration~~ [WIP] Add vLLM Semantic Router (vllm-sr) integration Jan 22, 2026

carlory marked this pull request as draft January 23, 2026 05:31

carlory commented Jan 23, 2026

View reviewed changes

Comment thread model_cost/model_cost.json

carlory and others added 6 commits January 23, 2026 14:37

generate_prediction_file create router instance according to router_n…

ba82450

…ame and router_cls_name from pipeline_params Signed-off-by: carlory <[email protected]>

add vllm sr

56a3c33

fix readme

3472e47

Signed-off-by: carlory <[email protected]>

Fix ruff formatting in vllm_sr.py

4a28cf6

Add trailing comma and reformat long return statement to meet formatting standards. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

update vllm-sr

6a9ee29

uv run python ./llm_inference/run.py vllm-sr

29baf7e

carlory force-pushed the vllm-sr branch from 5c9b183 to 29baf7e Compare January 23, 2026 06:37

carlory mentioned this pull request Jan 23, 2026

haiku and gemini have missing items in cached results for full dataset #64

Open

uv run python ./router_inference/generate_prediction_file.py vllm-sr …

952ae27

…full Signed-off-by: carlory <[email protected]>

Update model_cost.json and vllm-sr predictions

cd9eeb4

yl231 self-requested a review January 30, 2026 19:45

yl231 approved these changes Jan 30, 2026

View reviewed changes

carlory marked this pull request as ready for review February 2, 2026 02:30

carlory changed the title ~~[WIP] Add vLLM Semantic Router (vllm-sr) integration~~ Add vLLM Semantic Router (vllm-sr) integration Feb 2, 2026

carlory mentioned this pull request Feb 2, 2026

feature: collaborate with RouterArena team on evaluation frameworks vllm-project/semantic-router#1005

Open

Xunzhuo approved these changes Feb 2, 2026

View reviewed changes

yl231 merged commit 97635cb into RouteWorks:main Feb 2, 2026
6 checks passed

jiarong0907 mentioned this pull request May 3, 2026

vLLM-SR configuration used in RouterArena evaluation #91

Open

Conversation

carlory commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Uh oh!

yl231 commented Jan 22, 2026

Uh oh!

yl231 commented Jan 22, 2026

Uh oh!

carlory commented Jan 23, 2026

Uh oh!

carlory commented Jan 23, 2026

Uh oh!

yl231 commented Jan 23, 2026

Uh oh!

carlory commented Jan 23, 2026

Uh oh!

Uh oh!

yl231 commented Jan 23, 2026

Uh oh!

carlory commented Jan 23, 2026

Uh oh!

carlory commented Jan 28, 2026

Uh oh!

github-actions Bot commented Jan 30, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

yl231 commented Jan 30, 2026

Uh oh!

yl231 left a comment

Choose a reason for hiding this comment

Uh oh!

yl231 commented Jan 30, 2026

Uh oh!

carlory commented Feb 2, 2026

Uh oh!

Xunzhuo commented Feb 2, 2026

Uh oh!

Xunzhuo commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlory commented Feb 2, 2026

Uh oh!

Xunzhuo commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xunzhuo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

carlory commented Jan 22, 2026 •

edited

Loading

Xunzhuo commented Feb 2, 2026 •

edited

Loading

Xunzhuo commented Feb 2, 2026 •

edited

Loading