Weave Router (v0.27) submission#92
Open
steventohme wants to merge 3 commits into
Open
Conversation
Weave Router is a cluster-routing system over a 12-model BYOK pool spanning
Anthropic, OpenAI, Google, and OpenRouter providers. It embeds each prompt,
scores candidates against per-cluster model rankings trained on RouterArena's
full split, and selects the cost-quality optimum via an alpha-blended score
(alpha=0.40).
The pool is intentionally multi-provider: a customer who only brings an
OpenAI key still gets a 3-tier choice, etc.
Files added:
- router_inference/config/weave-router.json
- router_inference/predictions/weave-router.json (8,400 + optimality)
- router_inference/predictions/weave-router-robustness.json (420)
Files patched (additive only):
- universal_model_names.py: 11 entries for the 12-model pool
(gpt-4.1 + kimi-k2.5 already present upstream)
- model_cost/model_cost.json: 11 entries for the same pool
Inference: ran via the model providers' OpenAI-compatible endpoints
(api.openai.com, generativelanguage.googleapis.com, openrouter.ai).
Concurrency capped to 60 in-flight per provider.
Upstream already has claude-sonnet-4-5 at line 54; my surgical append re-added it. check-json hook caught the duplicate. Removing the re-added block leaves upstream's entry intact.
Author
|
/evaluate |
Contributor
|
FYI |
…success rows
Two validator failures from /evaluate run:
1. 559 rows had generated_answer="" but success=true. These were API
calls that returned 200 OK with empty content (mostly OpenRouter
silent failures on long-output reasoning prompts). Flipped success
to false; they grade as 0 (no answer).
2. ~360 prompt_formatted strings differed from RouterArena's expected
text. Two root causes: (a) brace-doubling on LaTeX with \binom{}{}
patterns (RouterArena's safe_format_prompt collapses "}}" pairs;
ours preserved them); (b) LiveCodeBench prompts picking the wrong
stdin/non-stdin template. Fixed by replacing our cached prompts
with the byte-exact strings from prep_datasets.py's router_data.json
and router_robustness.json.
Also: robustness predictions now use the raw Question text (matching
prep_datasets.py:30) instead of our locally-formatted prompts.
check_config_prediction_files.py weave-router full --check-generated-result
now passes locally.
Author
|
/evaluate |
Router Evaluation ResultsRouter: RouterArena Metrics
Optimality Metrics
Evaluation completed by RouterArena automated workflow |
Contributor
|
Dear @steventohme, Congrats! I would love to update the leaderboard to have Weave Router at the top. Would you provide me with the affiliation and website, if applicable? Best, |
Author
Hey Yifan. I reached out via email, we are yet to open source the project but will very soon. I want us to be on the leaderboard as an open source model. I will keep you updated when that happens (ETA 1-3 days) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Weave Router (v0.27) — submission
Affiliation: 💼 Workweave (closed-source)
A cluster-routing system over a 12-model BYOK pool spanning all four major provider families. The pool is intentionally multi-provider — a customer who only brings an OpenAI key still gets a 3-tier choice; bringing all four keys unlocks cost-optimal cross-provider routing.
How it routes
Pool
Files
router_inference/config/weave-router.jsonrouter_inference/predictions/weave-router.json— 8,400 regular + 8,899 optimalityrouter_inference/predictions/weave-router-robustness.json— 420 robustness routesuniversal_model_names.py(11 entries) andmodel_cost/model_cost.json(11 entries)Inference
Direct calls to
api.openai.com,generativelanguage.googleapis.com, andopenrouter.ai. Concurrency capped to 60 in-flight per provider.99.7% of calls succeeded; 55 reasoning-heavy prompts hit OpenRouter SSE timeouts and were retried twice.
Will trigger evaluation with `/evaluate` after review.