Weave Router (v0.27) submission by steventohme · Pull Request #92 · RouteWorks/RouterArena

steventohme · 2026-05-08T04:51:39Z

Weave Router (v0.27) — submission

Affiliation: 💼 Workweave (closed-source)

A cluster-routing system over a 12-model BYOK pool spanning all four major provider families. The pool is intentionally multi-provider — a customer who only brings an OpenAI key still gets a 3-tier choice; bringing all four keys unlocks cost-optimal cross-provider routing.

How it routes

Embed each prompt with Jina v2 INT8 ONNX (768-dim).
Top-p=4 cluster sum against per-cluster rankings trained on RouterArena's full split.
α-blended cost-quality score (α=0.40), argmax over the 12-model pool.

Pool

Provider	Models
Anthropic	claude-opus-4-7, claude-sonnet-4-5, claude-haiku-4-5
OpenAI	gpt-5.5, gpt-5.4-mini, gpt-4.1
Google	gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview
OpenRouter	deepseek/deepseek-v4-pro, qwen/qwen3.5-flash-02-23, deepseek/deepseek-v4-flash, moonshotai/kimi-k2.5

Files

router_inference/config/weave-router.json
router_inference/predictions/weave-router.json — 8,400 regular + 8,899 optimality
router_inference/predictions/weave-router-robustness.json — 420 robustness routes
Additive patches to universal_model_names.py (11 entries) and model_cost/model_cost.json (11 entries)

Inference

Direct calls to api.openai.com, generativelanguage.googleapis.com, and openrouter.ai. Concurrency capped to 60 in-flight per provider.

99.7% of calls succeeded; 55 reasoning-heavy prompts hit OpenRouter SSE timeouts and were retried twice.

Will trigger evaluation with `/evaluate` after review.

Weave Router is a cluster-routing system over a 12-model BYOK pool spanning Anthropic, OpenAI, Google, and OpenRouter providers. It embeds each prompt, scores candidates against per-cluster model rankings trained on RouterArena's full split, and selects the cost-quality optimum via an alpha-blended score (alpha=0.40). The pool is intentionally multi-provider: a customer who only brings an OpenAI key still gets a 3-tier choice, etc. Files added: - router_inference/config/weave-router.json - router_inference/predictions/weave-router.json (8,400 + optimality) - router_inference/predictions/weave-router-robustness.json (420) Files patched (additive only): - universal_model_names.py: 11 entries for the 12-model pool (gpt-4.1 + kimi-k2.5 already present upstream) - model_cost/model_cost.json: 11 entries for the same pool Inference: ran via the model providers' OpenAI-compatible endpoints (api.openai.com, generativelanguage.googleapis.com, openrouter.ai). Concurrency capped to 60 in-flight per provider.

Upstream already has claude-sonnet-4-5 at line 54; my surgical append re-added it. check-json hook caught the duplicate. Removing the re-added block leaves upstream's entry intact.

steventohme · 2026-05-08T05:02:31Z

/evaluate

jiarong0907 · 2026-05-08T05:10:46Z

FYI

Run set -euo pipefail
warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead
From https://github.com/RouteWorks/RouterArena
 * branch            main       -> FETCH_HEAD
From https://github.com/RouteWorks/RouterArena
 * [new ref]         refs/pull/92/head -> pr-92
Preparing worktree (checking out 'pr-92')
HEAD is now at d04f1f0 fix: drop duplicate claude-sonnet-4-5 from model_cost.json
→ git fetch origin main
→ git fetch origin pull/92/head:pr-92
→ git worktree add --force /home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92 pr-92
✔ Created worktree at /home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
▶ Syncing dependencies with uv...
warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead
Resolved 160 packages in 0.86ms
   Building routerarena @ file:///home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
      Built routerarena @ file:///home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
Prepared 1 package in 276ms
Uninstalled 1 package in 0.51ms
Installed 1 package in 0.52ms
 - routerarena==0.1.0 (from file:///home/runner/work/RouterArena/RouterArena/base)
 + routerarena==0.1.0 (from file:///home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92)
→ uv sync --locked
✔ Synced dependencies
▶ Validating prediction/config files...
warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead
Checking router: weave-router
Dataset split: full
================================================================================

[1] Checking config file...
✓ Config loaded from ./router_inference/config/weave-router.json
✓ Found 12 models in config
✓ All models in config are valid (found in ModelNameManager)

[2] Checking prediction file...
✓ Predictions loaded from ./router_inference/predictions/weave-router.json

[3] Checking prediction fields against dataset...
✓ Dataset loaded: 8400 entries
  Note: Found 8899 optimality entries (excluded from size check)
✓ Prediction file has correct size
✗ Found 1390 field validation errors:
  - Entry 13 (global_index: AIME_107): generated_result.generated_answer is empty but success is True
  - Entry 15 (global_index: AIME_112): generated_result.generated_answer is empty but success is True
  - Entry 27 (global_index: AIME_113): generated_result.generated_answer is empty but success is True
  - Entry 29 (global_index: AIME_16): prompt mismatch with dataset
  -   Expected: Please solve the following mathematical problem step by step. 

Context: None

Question: Find the re...
  -   Got: Please solve the following mathematical problem step by step. 

Context: None

Question: Find the re...
  - Entry 32 (global_index: AIME_3): prompt mismatch with dataset
  -   Expected: Please solve the following mathematical problem step by step. 

Context: None

Question: For any fin...
  -   Got: Please solve the following mathematical problem step by step. 

Context: None

Question: For any fin...
  - Entry 32 (global_index: AIME_3): generated_result.generated_answer is empty but success is True
  ... and 1380 more errors

[4] Checking model cost configurations...
✓ All models have cost configurations (57 models in cost file)

================================================================================
✗ VALIDATION FAILED!
Found 1390 error(s). Please fix the issues above.
================================================================================
✗ Command failed (exit code 1): uv run --active router_inference/check_config_prediction_files.py weave-router full --check-generated-result
Deleted branch pr-92 (was d04f1f0).
→ uv run --active router_inference/check_config_prediction_files.py weave-router full --check-generated-result
→ git worktree remove --force /home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
→ git branch -D pr-92

…success rows Two validator failures from /evaluate run: 1. 559 rows had generated_answer="" but success=true. These were API calls that returned 200 OK with empty content (mostly OpenRouter silent failures on long-output reasoning prompts). Flipped success to false; they grade as 0 (no answer). 2. ~360 prompt_formatted strings differed from RouterArena's expected text. Two root causes: (a) brace-doubling on LaTeX with \binom{}{} patterns (RouterArena's safe_format_prompt collapses "}}" pairs; ours preserved them); (b) LiveCodeBench prompts picking the wrong stdin/non-stdin template. Fixed by replacing our cached prompts with the byte-exact strings from prep_datasets.py's router_data.json and router_robustness.json. Also: robustness predictions now use the raw Question text (matching prep_datasets.py:30) instead of our locally-formatted prompts. check_config_prediction_files.py weave-router full --check-generated-result now passes locally.

steventohme · 2026-05-08T05:34:20Z

/evaluate

github-actions · 2026-05-08T05:57:36Z

Router Evaluation Results

Router: weave-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7461
Accuracy	78.43%
Total Cost	$7.718718
Avg Cost per Query	$0.000919
Avg Cost per 1K Queries	$0.9189
Number of Queries	8400
Robustness Score	0.7905

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.0138
Opt.Cost (Cost Efficiency)	0.1227
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

yl231 · 2026-05-09T01:41:23Z

Dear @steventohme, Congrats!

I would love to update the leaderboard to have Weave Router at the top. Would you provide me with the affiliation and website, if applicable?

Best,
Yifan

steventohme · 2026-05-09T01:52:01Z

Dear @steventohme, Congrats!

I would love to update the leaderboard to have Weave Router at the top. Would you provide me with the affiliation and website, if applicable?

Best, Yifan

Hey Yifan. I reached out via email, we are yet to open source the project but will very soon. I want us to be on the leaderboard as an open source model. I will keep you updated when that happens (ETA 1-3 days)

steventohme added 2 commits May 7, 2026 21:49

fix: drop duplicate claude-sonnet-4-5 from model_cost.json

d04f1f0

Upstream already has claude-sonnet-4-5 at line 54; my surgical append re-added it. check-json hook caught the duplicate. Removing the re-added block leaves upstream's entry intact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weave Router (v0.27) submission#92

Weave Router (v0.27) submission#92
steventohme wants to merge 3 commits into
RouteWorks:mainfrom
steventohme:weave-router-submission

steventohme commented May 8, 2026

Uh oh!

steventohme commented May 8, 2026

Uh oh!

jiarong0907 commented May 8, 2026

Uh oh!

steventohme commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

yl231 commented May 9, 2026

Uh oh!

steventohme commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

steventohme commented May 8, 2026