feat: Add R2-Router submission by jqxue1999 · Pull Request #68 · RouteWorks/RouterArena

jqxue1999 · 2026-02-11T02:59:26Z

Summary

R2-Router: A category-aware LLM router that uses Ridge regression to predict per-query quality scores for 4 LLMs (Qwen3-235B, Qwen3-80B, Gemini 2.5 Flash, Claude 3 Haiku) across 9 token budgets
Routes queries via risk = (1-λ)×quality - λ×cost with shrinkage toward category means (λ=0.999)
Trained on R2-Bench (30,968 queries × 9 LLMs × 16 budgets)

Files

File	Description
`router_inference/config/r2-router.json`	Router config (4 models, λ=0.999)
`router_inference/predictions/r2-router.json`	8400 regular + 2427 optimality entries
`router_inference/predictions/r2-router-robustness.json`	420 robustness entries
`universal_model_names.py`	Added Qwen3-235B and Qwen3-80B model names
`model_cost/model_cost.json`	Added pricing for both Qwen3 models

Estimated Metrics

Metric	Value
Arena Score (β=0.1)	~72.57
Accuracy	~71.94%
Cost/1K queries	~$0.040

Validation

All checks pass:

check_config_prediction_files.py r2-router full --check-generated-result ✓
check_config_prediction_files.py r2-router robustness --check-generated-result ✓

R2-Router is a category-aware LLM router that uses Ridge regression to predict per-query quality scores for 4 LLMs (Qwen3-235B, Qwen3-80B, Gemini Flash, Claude Haiku) across 9 token budgets. Routes via risk = (1-lambda)*quality - lambda*cost with shrinkage toward category means. - Config: 4-model pool with lambda=0.999 - Predictions: 8400 regular + 2427 optimality entries (10827 total) - Robustness: 420 entries - Model registrations: Qwen3-235B and Qwen3-80B added to universal_model_names - Cost configs: Added pricing for both Qwen3 models Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]>

ghost · 2026-02-11T03:28:02Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7212
Accuracy	71.94%
Total Cost	$0.602186
Avg Cost per Query	$0.000072
Avg Cost per 1K Queries	$0.0717
Number of Queries	8400
Robustness Score	0.8381

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.6292
Opt.Cost (Cost Efficiency)	0.8777
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

- gemini-2.5-flash → gemini-2.0-flash-001 (actual OpenRouter API) - claude-3-haiku-20240307 → claude-haiku-4.5 (actual OpenRouter API) - Added claude-haiku-4.5 to universal_model_names and model_cost.json Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]>

ghost · 2026-02-11T04:15:20Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7221
Accuracy	71.94%
Total Cost	$0.542999
Avg Cost per Query	$0.000065
Avg Cost per 1K Queries	$0.0646
Number of Queries	8400
Robustness Score	0.8381

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.1634
Opt.Cost (Cost Efficiency)	0.5949
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

jiarong0907 · 2026-02-11T05:39:45Z

Hi @jqxue1999, thanks for evaluating your router using our RouterArena. If the results look good to you, we will go ahead to post it on our website and README.

yl231

Looks good to me!

jqxue1999 · 2026-02-13T02:06:02Z

Hi, thanks again for providing the RouterArena evaluation support.

We are currently preparing a new set of results with some additional models integrated, and the performance may improve further. Would it be possible to hold off on posting the current results for now?

We will share the updated evaluation with you very soon.

Thanks a lot for your patience and support!

jiarong0907 · 2026-02-13T03:06:14Z

Hi, thanks again for providing the RouterArena evaluation support.

We are currently preparing a new set of results with some additional models integrated, and the performance may improve further. Would it be possible to hold off on posting the current results for now?

We will share the updated evaluation with you very soon.

Thanks a lot for your patience and support!

OK. Sounds good. Then, I converted this PR to draft.

- Switch from Ridge regression to Global KNN (K=28, cosine, distance-weighted) - Train on sub_10 split (809 queries), route all 8400 - Pool: Qwen3-235B (72.8%), Gemini 2.5 Flash (19.8%), Ministral-3B (7.4%) - Acc=70.64%, Cost=$0.0496/1kq, Arena(β=0.1)=71.21 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]>

Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]>

ghost · 2026-02-13T05:06:51Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7103
Accuracy	70.65%
Total Cost	$0.540387
Avg Cost per Query	$0.000064
Avg Cost per 1K Queries	$0.0643
Number of Queries	8400
Robustness Score	0.5000

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.4823
Opt.Cost (Cost Efficiency)	0.8388
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

ghost · 2026-02-13T05:27:58Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7104
Accuracy	70.67%
Total Cost	$0.540387
Avg Cost per Query	$0.000064
Avg Cost per 1K Queries	$0.0643
Number of Queries	8400
Robustness Score	0.5000

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.4823
Opt.Cost (Cost Efficiency)	0.8388
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

ghost · 2026-02-13T05:48:51Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7104
Accuracy	70.67%
Total Cost	$0.540387
Avg Cost per Query	$0.000064
Avg Cost per 1K Queries	$0.0643
Number of Queries	8400
Robustness Score	0.5000

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.4823
Opt.Cost (Cost Efficiency)	0.8388
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

ghost · 2026-02-13T06:08:59Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7104
Accuracy	70.67%
Total Cost	$0.540387
Avg Cost per Query	$0.000064
Avg Cost per 1K Queries	$0.0643
Number of Queries	8400
Robustness Score	0.5000

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.4823
Opt.Cost (Cost Efficiency)	0.8388
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

- Models: 235b, 80b, 30b, coder-next, gemini-flash, haiku - Budgets: concise, budget_200, budget_400, budget_800 - Training: sub_10 only (809 queries), Global KNN (cosine, distance-weighted) - Results: Acc=71.20%, Cost=$0.037/1kq, Arena(β=0.1)=71.94 - Beats Method 1 and Method 2 across all β values Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]>

ghost · 2026-02-13T16:04:47Z

Router Evaluation Results

Router: r2-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7160
Accuracy	71.23%
Total Cost	$0.513358
Avg Cost per Query	$0.000061
Avg Cost per 1K Queries	$0.0611
Number of Queries	8400
Robustness Score	0.4571

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.3238
Opt.Cost (Cost Efficiency)	0.7416
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

jqxue1999 · 2026-02-13T16:09:20Z

Thanks for your patience.

We’ve finished preparing the updated results and they’re ready now. Please feel free to proceed with posting them.

Let us know if you need anything else from our side.

Thanks again!

jiarong0907 requested a review from yl231 February 11, 2026 03:45

yl231 approved these changes Feb 13, 2026

View reviewed changes

jiarong0907 marked this pull request as draft February 13, 2026 03:06

Ahasannn and others added 2 commits February 12, 2026 23:41

fix: Add trailing newline to model_cost.json

0d89c02

Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <[email protected]> Co-Authored-By: Happy <[email protected]>

jqxue1999 force-pushed the r2-router-submission branch from 0d89c02 to d9c5f98 Compare February 13, 2026 05:03

jqxue1999 marked this pull request as ready for review February 13, 2026 16:06

jiarong0907 merged commit 1f30199 into RouteWorks:main Feb 16, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add R2-Router submission#68

feat: Add R2-Router submission#68
jiarong0907 merged 5 commits into
RouteWorks:mainfrom
jqxue1999:r2-router-submission

jqxue1999 commented Feb 11, 2026

Uh oh!

ghost commented Feb 11, 2026

Uh oh!

ghost commented Feb 11, 2026

Uh oh!

jiarong0907 commented Feb 11, 2026

Uh oh!

yl231 left a comment

Uh oh!

jqxue1999 commented Feb 13, 2026

Uh oh!

jiarong0907 commented Feb 13, 2026

Uh oh!

ghost commented Feb 13, 2026

Uh oh!

ghost commented Feb 13, 2026

Uh oh!

ghost commented Feb 13, 2026

Uh oh!

ghost commented Feb 13, 2026

Uh oh!

ghost commented Feb 13, 2026

Uh oh!

jqxue1999 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jqxue1999 commented Feb 11, 2026

Summary

Files

Estimated Metrics

Validation

Uh oh!

ghost commented Feb 11, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

ghost commented Feb 11, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

jiarong0907 commented Feb 11, 2026

Uh oh!

yl231 left a comment

Choose a reason for hiding this comment

Uh oh!

jqxue1999 commented Feb 13, 2026

Uh oh!

jiarong0907 commented Feb 13, 2026

Uh oh!

ghost commented Feb 13, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

ghost commented Feb 13, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

ghost commented Feb 13, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

ghost commented Feb 13, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

ghost commented Feb 13, 2026

Router Evaluation Results

RouterArena Metrics

Optimality Metrics

Uh oh!

jqxue1999 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants