fix(openai-compat): surface availability state for OpenAI-compatible backends#149
Conversation
…backends OpenAI-compatible /v1/models endpoints (vllm, llama.cpp, Infinity, etc.) return only id/object/created/owned_by — no size, no state field. As a result, openAIParser left model.Size = 0 and metadata["state"] unset, so MapModelState() always fell through to "unknown" for these backends. Meanwhile, the unified converter read ep.State (the legacy string set once at discovery and never transitioned), not ep.GetEffectiveState() which consults the typed ModelState field populated by the lifecycle unifier. Combined, endpoints that successfully routed traffic still reported state: "unknown" forever in /olla/models. Two-part fix: - openAIParser: set Size = 1 sentinel. For OpenAI-compat backends, presence in the discovery response IS the availability signal — these servers only list models that are loaded and ready to serve. - unified_converter: read string(ep.GetEffectiveState()) instead of ep.State, so health-driven transitions and the typed state machine both surface in the API response. All tests pass: go test ./... Co-Authored-By: Claude Opus 4.7 <[email protected]>
WalkthroughThe PR modifies model discovery and availability handling in two coordinated places. The OpenAI parser now assigns a sentinel ChangesModel availability state normalisation
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/adapter/converter/unified_converter.go`:
- Around line 77-85: convertModel builds availability using
ep.GetEffectiveState() but matchesAvailabilityFilter still reads the legacy
ep.State, causing inconsistent filtering vs the returned availability.State;
update matchesAvailabilityFilter to use ep.GetEffectiveState() (or the
normalized typed value it returns) instead of ep.State so the
/olla/models?available=... filter aligns with the availability entries produced
by convertModel; locate matchesAvailabilityFilter and change its state-check
logic to call ep.GetEffectiveState() (or compare against the same enum/string
used when creating EndpointStatus) so both filtering and payload use the same
effective state source.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d6f7ebf1-17f6-42bf-9e29-c7b6898716bf
📒 Files selected for processing (2)
internal/adapter/converter/unified_converter.gointernal/adapter/registry/profile/parsers.go
| // Use GetEffectiveState() rather than ep.State directly: the lifecycle | ||
| // unifier updates the typed ModelState field, while ep.State (the legacy | ||
| // string) is only set at discovery time and never transitions. Reading | ||
| // the effective state ensures health-driven transitions surface in the | ||
| // API response. GetEffectiveState() also normalises legacy string values | ||
| // ("loaded", "not-loaded", "available") to the typed enum. | ||
| availability = append(availability, EndpointStatus{ | ||
| Endpoint: ep.EndpointName, // Use endpoint name instead of URL | ||
| State: ep.State, | ||
| State: string(ep.GetEffectiveState()), |
There was a problem hiding this comment.
Keep availability filtering aligned with effective state
convertModel now uses effective state, but matchesAvailabilityFilter still checks legacy state (ep.State) at Line 172. This can produce inconsistent /olla/models?available=... results versus the availability.state returned in the payload.
Suggested fix
func matchesAvailabilityFilter(model *domain.UnifiedModel, available *bool) bool {
if available == nil {
return true
}
isAvailable := false
for _, ep := range model.SourceEndpoints {
- if ep.State == "loaded" {
+ if string(ep.GetEffectiveState()) == "loaded" {
isAvailable = true
break
}
}
return *available == isAvailable
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Use GetEffectiveState() rather than ep.State directly: the lifecycle | |
| // unifier updates the typed ModelState field, while ep.State (the legacy | |
| // string) is only set at discovery time and never transitions. Reading | |
| // the effective state ensures health-driven transitions surface in the | |
| // API response. GetEffectiveState() also normalises legacy string values | |
| // ("loaded", "not-loaded", "available") to the typed enum. | |
| availability = append(availability, EndpointStatus{ | |
| Endpoint: ep.EndpointName, // Use endpoint name instead of URL | |
| State: ep.State, | |
| State: string(ep.GetEffectiveState()), | |
| func matchesAvailabilityFilter(model *domain.UnifiedModel, available *bool) bool { | |
| if available == nil { | |
| return true | |
| } | |
| isAvailable := false | |
| for _, ep := range model.SourceEndpoints { | |
| if string(ep.GetEffectiveState()) == "loaded" { | |
| isAvailable = true | |
| break | |
| } | |
| } | |
| return *available == isAvailable | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/adapter/converter/unified_converter.go` around lines 77 - 85,
convertModel builds availability using ep.GetEffectiveState() but
matchesAvailabilityFilter still reads the legacy ep.State, causing inconsistent
filtering vs the returned availability.State; update matchesAvailabilityFilter
to use ep.GetEffectiveState() (or the normalized typed value it returns) instead
of ep.State so the /olla/models?available=... filter aligns with the
availability entries produced by convertModel; locate matchesAvailabilityFilter
and change its state-check logic to call ep.GetEffectiveState() (or compare
against the same enum/string used when creating EndpointStatus) so both
filtering and payload use the same effective state source.
Summary
OpenAI-compatible backends (vLLM, llama.cpp, Infinity, sglang, lmdeploy, etc.) always show
state: "unknown"in/olla/models, even when they are healthy, discovered, and actively serving traffic. This makes/olla/modelsunusable as a model-availability signal for clients that filter onavailability[].state == "available".Reproduction
http://host:8000).curl http://olla/olla/models.availability[].state: "unknown"for every endpoint, indefinitely.Root cause
There are two cooperating bugs:
Bug 1 —
openAIParsernever populates state-inferring fields. The standard OpenAI/v1/modelsresponse contains onlyid,object,created,owned_by— nosize, nostate.openAIParser.Parseininternal/adapter/registry/profile/parsers.gotherefore leavesmodelInfo.Size = 0and never writesmetadata["state"].ModelExtractor.MapModelState(ininternal/adapter/unifier/model_builder.go) checksmetadata["state"], thenmetadata["loaded"], thenmodelSize > 0, and falls through toreturn "unknown". For every OpenAI-compatible backend, the fall-through is the only branch ever taken.Bug 2 — converter reads stale string field, not effective state.
UnifiedConverter.convertModelininternal/adapter/converter/unified_converter.goreadsep.Statedirectly.SourceEndpointhas two parallel state fields:State(legacy string set once at discovery) andModelState(typed enum updated by the lifecycle unifier). The lifecycle unifier writes only toModelState, so any health-driven transitions never surface in the API response. The domain already providesSourceEndpoint.GetEffectiveState()which consults both fields and normalises legacy strings — the converter just wasn't using it.Fix
Two small changes:
internal/adapter/registry/profile/parsers.go— InopenAIParser.Parse, setmodelInfo.Size = 1as a sentinel for any successfully-discovered model. For OpenAI-compatible backends, presence in the/v1/modelsresponse IS the availability signal — these servers only list models that are loaded and ready to serve.MapModelStatethen returns"available"via the existingmodelSize > 0branch.internal/adapter/converter/unified_converter.go— ReplaceState: ep.StatewithState: string(ep.GetEffectiveState())so the converter consults both the typed state machine and the legacy string field, with the existing fallback semantics defined inSourceEndpoint.GetEffectiveState().Why these changes are safe
Sizesentinel is only consumed byMapModelState(for the unknown/available branch) and by parameter-count estimation; neither makes behavioural assumptions about absolute byte values that a sentinel of1would violate.GetEffectiveState()already exists, is used in tests, and falls through toModelStateUnknownwhen neither field has data — so the previous behaviour is preserved for genuinely unknown endpoints.Test plan
go test ./...— all packages pass (28 test packages, zero failures)state: "available"after the patch (state: "unknown"before)size/statenatively) are unaffected — these flow throughollamaParser, notopenAIParser, and the converter change readsGetEffectiveState()which preserves Ollama's"loaded"/"not-loaded"semantics via the existing switch inSourceEndpoint.GetEffectiveState().Summary by CodeRabbit