feat: weaver — multi-model ensemble for higher-quality answers#158
Open
yudduy wants to merge 1 commit into
Open
feat: weaver — multi-model ensemble for higher-quality answers#158yudduy wants to merge 1 commit into
yudduy wants to merge 1 commit into
Conversation
Adds a coordinator-side orchestration mode that runs multiple candidate generations in parallel and scores them with verifier models, returning the highest-scoring response with a confidence metric. Inspired by the Stanford Weaver paper (https://scalingintelligence.stanford.edu/pubs/weaver.pdf). Coordinator: - /v1/chat/completions accepts {"weaver": {"mode": "deep"|"deep_plus"}} - Streams typed SSE events (weaver_init, candidate_delta, verifier_score, weaver_final) with the full trace - Verifier selection picks diverse model families from the catalog, falls back to self-verification with a warning if no eligible verifiers exist - Adds Family + CanVerify to SupportedModel (memory + postgres backends) - Billing charges candidate and verifier calls independently Provider: - Reasoning parser opt-in by family (Gemma4, DeepSeek R1, Qwen3/QwQ); Qwen2.5 no longer mis-applies the qwen3 parser Console UI: - Fast / Deep / Deep+ mode selector in ChatInput - Live candidate strip + tabbed trace modal - New auth-checked proxy routes for device/provider/stats endpoints Known follow-ups (not blocking): - Billing reserve→refund is two non-atomic writes (no two-phase commit) - No model-level SupportsWeaver flag (any catalog model is eligible) - Confidence metric is 0 when only one candidate succeeds - E2E + weaver path lacks a dedicated test
|
@yudduy is attempting to deploy a commit to the EigenLabs Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stanford Weaver paper: https://scalingintelligence.stanford.edu/pubs/weaver.pdf
Small models as a hive mind — d-inference is already a fleet of them, so ensembles felt like a natural fit. Opens the door to other ensemble strategies later.
What it does
New "Deep" / "Deep+" chat mode. Runs N candidate generations in parallel, scores them with verifier models, returns the winner + trace via SSE. Provider ↔ coordinator protocol unchanged.
API:
"weaver": {"mode": "deep" | "deep_plus"}on /v1/chat/completions.UI: Fast / Deep / Deep+ buttons in the composer.
Changes
Family+CanVerifyon SupportedModel (memory + postgres)Known follow-ups
SupportsWeaverflagTested
go testgreen across api, store, cmd/coordinator