Skip to content

feat: capability and signal-aware routing#973

Open
Spherrrical wants to merge 1 commit into
mainfrom
musa/capability-signal-aware-routing
Open

feat: capability and signal-aware routing#973
Spherrrical wants to merge 1 commit into
mainfrom
musa/capability-signal-aware-routing

Conversation

@Spherrrical

Copy link
Copy Markdown
Collaborator

Summary

Adds a two-tier routing-signal surface to Plano so the router can serve multimodal requests and route on richer signals, while keeping the Plano-Orchestrator text-blind.

  • Tier 1 — hard capability filters (deterministic, request-shape driven): prune models that cannot physically serve a request (vision input, /v1/images/generations, /v1/audio/speech, context-window fit). Capabilities are fetched at runtime from models.dev — the same pattern as DigitalOcean cost / Prometheus latency, no committed snapshot — with optional per-model capabilities: overrides in config. Precedence: user config > models.dev > conservative default.
  • Tier 2 — soft routing signal: rank the already-capable pool by an internal, benchmark-seeded long_context_quality score (separate from capabilities; not user-authored).
  • empty_pool_behavior (error default → 422 / warning → proceed) is the only lever that lets routing_preferences win over a capability filter.

What's included

  • RequiredCapabilities from request shape + has_vision() / required_context_tokens() on ProviderRequest; intersection filter in determine_route() before ranking, plus no-match-path validation.
  • ModelCapabilitiesService (brightstaff) mirroring ModelMetricsService: runtime fetch + optional refresh loop, empty-on-failure.
  • long_context_quality selection policy; internal long_context_quality.yaml dataset with provenance.
  • Image-out / audio-out API variants and types; binary passthrough in llm_gateway (no JSON mangling/token accounting for audio/* etc.).
  • Tier-attributed routing telemetry: capability-filter latency/outcome + pool-size metrics and an LCQ staleness gauge; per-modality fastest groundwork (latency keyed by (model, modality), cold-start seeding).
  • plano_config_schema.yaml (capabilities block, long_context_quality, overrides.empty_pool_behavior, overrides.model_capabilities_source), config-generator passthrough + tests, and vision/long-context/image/TTS presets in the config reference docs.

Notes for reviewers

  • Capabilities deliberately follow the runtime-fetch model (no vendored JSON in the repo / WASM filters). Tradeoff: on a cold start where models.dev is unreachable, capabilities fall back to text-only until the first fetch — under empty_pool_behavior: error that can 422 a multimodal request. Pin with an explicit capabilities: block to guarantee offline behavior.
  • Image/audio endpoints are currently filtered out of the client-side SupportedAPIsFromClient mapping so they aren't misparsed as chat; capability filtering keys off the endpoint path. Full client-side request transforms are a follow-up.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy -p common -p hermesllm -p brightstaff --all-targets -- -D warnings (+ clippy on the wasm32-wasip1 llm_gateway target)
  • cargo test --lib for common (39), hermesllm (167), brightstaff (171), llm_gateway (6)
  • WASM build (llm_gateway, prompt_gateway) + verified the capabilities blob is no longer embedded
  • CLI pytest test/test_config_generator.py (27, incl. new schema cases); full-reference config validates against the schema
  • Reviewer: sanity-check models.dev fetch behavior + empty_pool_behavior semantics against expectations

Adds a two-tier routing-signal surface. Tier 1 hard capability filters
(modality + context-window fit) prune models that cannot serve a request;
capabilities are fetched at runtime from models.dev (like cost/latency
metrics, no committed snapshot) with user-config overrides. Tier 2 ranks
the capable pool by internal long-context-quality scores.

Includes vision-input routing, /v1/images/generations and /v1/audio/speech
(binary passthrough), empty_pool_behavior hard gate, tier-attributed
telemetry, per-modality fastest groundwork, schema + docs presets.
@Spherrrical Spherrrical changed the title feat: capability and signal-aware routing (multimodality) feat: capability and signal-aware routing Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant