Add vLLM Semantic Router (vllm-sr) integration#61
Conversation
|
@carlory Thanks for the PR. There is a small error in the workflow, and I am looking into that. |
|
@carlory Hi, thank you for the submission again. The workflow failed because of two reasons.
Thank you! |
|
@yl231 What's the relationship between |
Thank you. Let me test it with all the existing models in the pipeline first. The uploaded file is outdated. I will regenerate it. |
|
cached results:
@yl231 Cloud you provide full cached results for other models if possible? I don't have access to them right now. |
…ame and router_cls_name from pipeline_params Signed-off-by: carlory <[email protected]>
Signed-off-by: carlory <[email protected]>
Add trailing comma and reformat long return statement to meet formatting standards. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Yes, I could do so! Could you give me a list of what you want? |
model list: claude-3-haiku-20240307 and gemini-2.0-flash-001. |
…full Signed-off-by: carlory <[email protected]>
@yl231 I regenerate the router_inference/predictions/vllm-sr.json with full dataset. |
Router Evaluation ResultsRouter: RouterArena Metrics
Optimality Metrics
Evaluation completed by RouterArena automated workflow |
|
@carlory I have run the inference and evaluation for vLLM-SR, and you rank in first place now! Congratulations! |
yl231
left a comment
There was a problem hiding this comment.
Thank you for your submission!
Do you want this result be posted on the leaderboard? If so, I will update the README.md to reflect it. Thanks again! |
|
Cool. Thank you! Please update the README file. @yl231 |
|
thanks @carlory nice work! i think we can investigate more signal using in the arena |
|
btw we recently re-trained long context multilingual bert, just got into the main, which may largely improve our accuracy as well. we can continuously update the scores in follow up |
|
Selected Models (used in this PR): I removed some models because they need more memory than my laptop can provide.
Ok, I can do it in a follow-up PR. |
Summary
This PR adds support for the vLLM Semantic Router (vllm-sr) to RouterArena, along with significant infrastructure improvements for batch evaluation and parallel processing:
Key Changes
vLLM-SR Router Implementation (
router_inference/router/vllm_sr.py)Results & Configuration
router_inference/config/vllm-sr.json)Close vllm-project/semantic-router#1005
🤖 Generated with Claude Code