Applied autoresearch to model routing policy — ZeroAPI eval loop #508

dorukardahan · 2026-04-10T18:19:08Z

dorukardahan
Apr 10, 2026

I took the measure→experiment→promote cycle from autoresearch and applied it to AI model routing policy instead of model training.

The problem: I pay for 5 AI subscriptions (OpenAI, Kimi, GLM, MiniMax, Qwen — $109/mo total). Each has different strengths. Manually picking the right model per task is guesswork.

What ZeroAPI does: An OpenClaw gateway plugin that classifies tasks via keyword matching (<1ms, no LLM call) and routes to the benchmark leader for that category. 155 models scored across 15 benchmarks from Artificial Analysis.

The autoresearch connection: All routing constants (category keywords, risk levels, vision detection, TTFT thresholds, fallback ordering) live in a JSON config. Every routing decision is logged. A built-in eval script (scripts/eval.ts) analyzes the logs and reports:

Category distribution (what % of prompts go where)
Risk override rate (are high-risk keywords too aggressive?)
Provider diversity (are fallbacks actually being used?)
Keyword hit rates (which keywords never match?)
Concrete tuning suggestions

The loop: run eval → change one config constant → restart → wait for traffic → re-eval → keep what improves routing, revert what doesnt.

Same pattern as autoresearch but manual instead of automated — routing policy needs human judgment, not overnight hill-climbing.

Repo: https://github.com/dorukardahan/ZeroAPI

Open source, MIT. Works with any OpenClaw setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applied autoresearch to model routing policy — ZeroAPI eval loop #508

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Applied autoresearch to model routing policy — ZeroAPI eval loop #508

Uh oh!

dorukardahan Apr 10, 2026

Replies: 0 comments

dorukardahan
Apr 10, 2026