Skip to content

Ensemble: cost/token cap that degrades to a single model (P1) #621

Description

@moonming

Summary (P1)

An ensemble issues N panel calls + 1 judge call per request, so it costs and consumes tokens several × a single model. Today there is no pre-flight cost/token control: budgets only act after the per-sub-call usage events are emitted, so a single ensemble request cannot be capped or degraded before it spends. Add a guardrail that degrades the ensemble to a single model (or fails) when a configured cap is exceeded.

This is the cost-budget guardrail in api7/AISIX-Cloud#804's Phase-2 roadmap ("cost-budget guardrail (max_cost_usd / token cap → degrade to a single model)").

Why

  • Ensembles are gated Team+ and marketed as "several low-cost models stand in for a frontier model" — but without a cap, a large panel on long prompts can blow a budget far faster than operators expect.
  • The N× multiplier makes ensembles the most important model kind to have a spend ceiling.

Proposed shape

  • DP-native (token cap), v1: an optional cap on EnsembleConfig (e.g. max_total_tokens or max_panel_calls); when the projected/accumulated cost exceeds it, the executor degrades — skip the panel and serve a single configured model (e.g. the judge or the first panel member), or fail with a clear error. No pricing data needed.
  • CP-backed (max_cost_usd): the managed budget controller already prices usage; wire a per-request max_cost_usd ceiling for ensembles that triggers the same degrade path. Needs CP pricing (the OSS proxy emits cost_usd=0).
  • Make the degrade observable (a header / telemetry flag) so operators can see when it fired.

Open questions

  • Degrade target: judge-only? first panel member? operator-configured fallback model (could reference a routing model for free failover)?
  • Pre-flight estimate vs. mid-flight stop (panels run concurrently, so a mid-flight stop only helps the judge call).

Scope

DP executor (aisix-proxy) for the token-cap + degrade mechanism; CP for the max_cost_usd ceiling + surfacing the control in the dashboard form. Tracking: api7/AISIX-Cloud#804 (Phase 2).

Metadata

Metadata

Assignees

No one assigned

    Labels

    cross-repoRequires changes in DP + CP + Dashboard UI + e2eenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions