Skip to content

feat: Add R2-Router submission#68

Merged
jiarong0907 merged 5 commits into
RouteWorks:mainfrom
jqxue1999:r2-router-submission
Feb 16, 2026
Merged

feat: Add R2-Router submission#68
jiarong0907 merged 5 commits into
RouteWorks:mainfrom
jqxue1999:r2-router-submission

Conversation

@jqxue1999
Copy link
Copy Markdown
Contributor

Summary

  • R2-Router: A category-aware LLM router that uses Ridge regression to predict per-query quality scores for 4 LLMs (Qwen3-235B, Qwen3-80B, Gemini 2.5 Flash, Claude 3 Haiku) across 9 token budgets
  • Routes queries via risk = (1-λ)×quality - λ×cost with shrinkage toward category means (λ=0.999)
  • Trained on R2-Bench (30,968 queries × 9 LLMs × 16 budgets)

Files

File Description
router_inference/config/r2-router.json Router config (4 models, λ=0.999)
router_inference/predictions/r2-router.json 8400 regular + 2427 optimality entries
router_inference/predictions/r2-router-robustness.json 420 robustness entries
universal_model_names.py Added Qwen3-235B and Qwen3-80B model names
model_cost/model_cost.json Added pricing for both Qwen3 models

Estimated Metrics

Metric Value
Arena Score (β=0.1) ~72.57
Accuracy ~71.94%
Cost/1K queries ~$0.040

Validation

All checks pass:

  • check_config_prediction_files.py r2-router full --check-generated-result
  • check_config_prediction_files.py r2-router robustness --check-generated-result

R2-Router is a category-aware LLM router that uses Ridge regression
to predict per-query quality scores for 4 LLMs (Qwen3-235B, Qwen3-80B,
Gemini Flash, Claude Haiku) across 9 token budgets. Routes via
risk = (1-lambda)*quality - lambda*cost with shrinkage toward category means.

- Config: 4-model pool with lambda=0.999
- Predictions: 8400 regular + 2427 optimality entries (10827 total)
- Robustness: 420 entries
- Model registrations: Qwen3-235B and Qwen3-80B added to universal_model_names
- Cost configs: Added pricing for both Qwen3 models

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Happy <[email protected]>
@github-actions
Copy link
Copy Markdown

ghost commented Feb 11, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7212
Accuracy 71.94%
Total Cost $0.602186
Avg Cost per Query $0.000072
Avg Cost per 1K Queries $0.0717
Number of Queries 8400
Robustness Score 0.8381

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.6292
Opt.Cost (Cost Efficiency) 0.8777
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

- gemini-2.5-flash → gemini-2.0-flash-001 (actual OpenRouter API)
- claude-3-haiku-20240307 → claude-haiku-4.5 (actual OpenRouter API)
- Added claude-haiku-4.5 to universal_model_names and model_cost.json

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Happy <[email protected]>
@jiarong0907 jiarong0907 requested a review from yl231 February 11, 2026 03:45
@github-actions
Copy link
Copy Markdown

ghost commented Feb 11, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7221
Accuracy 71.94%
Total Cost $0.542999
Avg Cost per Query $0.000065
Avg Cost per 1K Queries $0.0646
Number of Queries 8400
Robustness Score 0.8381

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.1634
Opt.Cost (Cost Efficiency) 0.5949
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

@jiarong0907
Copy link
Copy Markdown
Contributor

Hi @jqxue1999, thanks for evaluating your router using our RouterArena. If the results look good to you, we will go ahead to post it on our website and README.

Copy link
Copy Markdown
Contributor

@yl231 yl231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jqxue1999
Copy link
Copy Markdown
Contributor Author

Hi, thanks again for providing the RouterArena evaluation support.

We are currently preparing a new set of results with some additional models integrated, and the performance may improve further. Would it be possible to hold off on posting the current results for now?

We will share the updated evaluation with you very soon.

Thanks a lot for your patience and support!

@jiarong0907
Copy link
Copy Markdown
Contributor

Hi, thanks again for providing the RouterArena evaluation support.

We are currently preparing a new set of results with some additional models integrated, and the performance may improve further. Would it be possible to hold off on posting the current results for now?

We will share the updated evaluation with you very soon.

Thanks a lot for your patience and support!

OK. Sounds good. Then, I converted this PR to draft.

@jiarong0907 jiarong0907 marked this pull request as draft February 13, 2026 03:06
Ahasannn and others added 2 commits February 12, 2026 23:41
- Switch from Ridge regression to Global KNN (K=28, cosine, distance-weighted)
- Train on sub_10 split (809 queries), route all 8400
- Pool: Qwen3-235B (72.8%), Gemini 2.5 Flash (19.8%), Ministral-3B (7.4%)
- Acc=70.64%, Cost=$0.0496/1kq, Arena(β=0.1)=71.21

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Happy <[email protected]>
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Happy <[email protected]>
@jqxue1999 jqxue1999 force-pushed the r2-router-submission branch from 0d89c02 to d9c5f98 Compare February 13, 2026 05:03
@github-actions
Copy link
Copy Markdown

ghost commented Feb 13, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7103
Accuracy 70.65%
Total Cost $0.540387
Avg Cost per Query $0.000064
Avg Cost per 1K Queries $0.0643
Number of Queries 8400
Robustness Score 0.5000

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.4823
Opt.Cost (Cost Efficiency) 0.8388
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

@github-actions
Copy link
Copy Markdown

ghost commented Feb 13, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7104
Accuracy 70.67%
Total Cost $0.540387
Avg Cost per Query $0.000064
Avg Cost per 1K Queries $0.0643
Number of Queries 8400
Robustness Score 0.5000

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.4823
Opt.Cost (Cost Efficiency) 0.8388
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

2 similar comments
@github-actions
Copy link
Copy Markdown

ghost commented Feb 13, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7104
Accuracy 70.67%
Total Cost $0.540387
Avg Cost per Query $0.000064
Avg Cost per 1K Queries $0.0643
Number of Queries 8400
Robustness Score 0.5000

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.4823
Opt.Cost (Cost Efficiency) 0.8388
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

@github-actions
Copy link
Copy Markdown

ghost commented Feb 13, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7104
Accuracy 70.67%
Total Cost $0.540387
Avg Cost per Query $0.000064
Avg Cost per 1K Queries $0.0643
Number of Queries 8400
Robustness Score 0.5000

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.4823
Opt.Cost (Cost Efficiency) 0.8388
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

- Models: 235b, 80b, 30b, coder-next, gemini-flash, haiku
- Budgets: concise, budget_200, budget_400, budget_800
- Training: sub_10 only (809 queries), Global KNN (cosine, distance-weighted)
- Results: Acc=71.20%, Cost=$0.037/1kq, Arena(β=0.1)=71.94
- Beats Method 1 and Method 2 across all β values

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Happy <[email protected]>
@github-actions
Copy link
Copy Markdown

ghost commented Feb 13, 2026

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7160
Accuracy 71.23%
Total Cost $0.513358
Avg Cost per Query $0.000061
Avg Cost per 1K Queries $0.0611
Number of Queries 8400
Robustness Score 0.4571

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.3238
Opt.Cost (Cost Efficiency) 0.7416
Opt.Acc (Accuracy vs Optimal) 1.0000

Evaluation completed by RouterArena automated workflow

@jqxue1999 jqxue1999 marked this pull request as ready for review February 13, 2026 16:06
@jqxue1999
Copy link
Copy Markdown
Contributor Author

Thanks for your patience.

We’ve finished preparing the updated results and they’re ready now. Please feel free to proceed with posting them.

Let us know if you need anything else from our side.

Thanks again!

@jiarong0907 jiarong0907 merged commit 1f30199 into RouteWorks:main Feb 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants