feat: capability and signal-aware routing by Spherrrical · Pull Request #973 · katanemo/plano

Spherrrical · 2026-06-22T20:21:07Z

Summary

Adds a two-tier routing-signal surface to Plano so the router can serve multimodal requests and route on richer signals, while keeping the Plano-Orchestrator text-blind.

Tier 1 — hard capability filters (deterministic, request-shape driven): prune models that cannot physically serve a request (vision input, /v1/images/generations, /v1/audio/speech, context-window fit). Capabilities are fetched at runtime from models.dev — the same pattern as DigitalOcean cost / Prometheus latency, no committed snapshot — with optional per-model capabilities: overrides in config. Precedence: user config > models.dev > conservative default.
Tier 2 — soft routing signal: rank the already-capable pool by an internal, benchmark-seeded long_context_quality score (separate from capabilities; not user-authored).
empty_pool_behavior (error default → 422 / warning → proceed) is the only lever that lets routing_preferences win over a capability filter.

What's included

RequiredCapabilities from request shape + has_vision() / required_context_tokens() on ProviderRequest; intersection filter in determine_route() before ranking, plus no-match-path validation.
ModelCapabilitiesService (brightstaff) mirroring ModelMetricsService: runtime fetch + optional refresh loop, empty-on-failure.
long_context_quality selection policy; internal long_context_quality.yaml dataset with provenance.
Image-out / audio-out API variants and types; binary passthrough in llm_gateway (no JSON mangling/token accounting for audio/* etc.).
Tier-attributed routing telemetry: capability-filter latency/outcome + pool-size metrics and an LCQ staleness gauge; per-modality fastest groundwork (latency keyed by (model, modality), cold-start seeding).
plano_config_schema.yaml (capabilities block, long_context_quality, overrides.empty_pool_behavior, overrides.model_capabilities_source), config-generator passthrough + tests, and vision/long-context/image/TTS presets in the config reference docs.

Notes for reviewers

Capabilities deliberately follow the runtime-fetch model (no vendored JSON in the repo / WASM filters). Tradeoff: on a cold start where models.dev is unreachable, capabilities fall back to text-only until the first fetch — under empty_pool_behavior: error that can 422 a multimodal request. Pin with an explicit capabilities: block to guarantee offline behavior.
Image/audio endpoints are currently filtered out of the client-side SupportedAPIsFromClient mapping so they aren't misparsed as chat; capability filtering keys off the endpoint path. Full client-side request transforms are a follow-up.

Test plan

cargo fmt --all -- --check
cargo clippy -p common -p hermesllm -p brightstaff --all-targets -- -D warnings (+ clippy on the wasm32-wasip1 llm_gateway target)
cargo test --lib for common (39), hermesllm (167), brightstaff (171), llm_gateway (6)
WASM build (llm_gateway, prompt_gateway) + verified the capabilities blob is no longer embedded
CLI pytest test/test_config_generator.py (27, incl. new schema cases); full-reference config validates against the schema
Reviewer: sanity-check models.dev fetch behavior + empty_pool_behavior semantics against expectations

Adds a two-tier routing-signal surface. Tier 1 hard capability filters (modality + context-window fit) prune models that cannot serve a request; capabilities are fetched at runtime from models.dev (like cost/latency metrics, no committed snapshot) with user-config overrides. Tier 2 ranks the capable pool by internal long-context-quality scores. Includes vision-input routing, /v1/images/generations and /v1/audio/speech (binary passthrough), empty_pool_behavior hard gate, tier-attributed telemetry, per-modality fastest groundwork, schema + docs presets.

Spherrrical changed the title ~~feat: capability and signal-aware routing (multimodality)~~ feat: capability and signal-aware routing Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: capability and signal-aware routing#973

feat: capability and signal-aware routing#973
Spherrrical wants to merge 1 commit into
mainfrom
musa/capability-signal-aware-routing

Spherrrical commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Spherrrical commented Jun 22, 2026

Summary

What's included

Notes for reviewers

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant