Skip to content

fix(openai-compat): surface availability state for OpenAI-compatible backends#149

Open
petersimmons1972 wants to merge 1 commit into
thushan:mainfrom
petersimmons1972:fix/openai-compat-state-unknown
Open

fix(openai-compat): surface availability state for OpenAI-compatible backends#149
petersimmons1972 wants to merge 1 commit into
thushan:mainfrom
petersimmons1972:fix/openai-compat-state-unknown

Conversation

@petersimmons1972
Copy link
Copy Markdown

@petersimmons1972 petersimmons1972 commented May 20, 2026

Summary

OpenAI-compatible backends (vLLM, llama.cpp, Infinity, sglang, lmdeploy, etc.) always show state: "unknown" in /olla/models, even when they are healthy, discovered, and actively serving traffic. This makes /olla/models unusable as a model-availability signal for clients that filter on availability[].state == "available".

Reproduction

  1. Configure Olla with at least one OpenAI-compatible endpoint (e.g., a vLLM server at http://host:8000).
  2. Wait for model discovery to complete (Olla logs confirm models discovered, requests route successfully).
  3. curl http://olla/olla/models.
  4. Observe availability[].state: "unknown" for every endpoint, indefinitely.

Root cause

There are two cooperating bugs:

Bug 1 — openAIParser never populates state-inferring fields. The standard OpenAI /v1/models response contains only id, object, created, owned_by — no size, no state. openAIParser.Parse in internal/adapter/registry/profile/parsers.go therefore leaves modelInfo.Size = 0 and never writes metadata["state"]. ModelExtractor.MapModelState (in internal/adapter/unifier/model_builder.go) checks metadata["state"], then metadata["loaded"], then modelSize > 0, and falls through to return "unknown". For every OpenAI-compatible backend, the fall-through is the only branch ever taken.

Bug 2 — converter reads stale string field, not effective state. UnifiedConverter.convertModel in internal/adapter/converter/unified_converter.go reads ep.State directly. SourceEndpoint has two parallel state fields: State (legacy string set once at discovery) and ModelState (typed enum updated by the lifecycle unifier). The lifecycle unifier writes only to ModelState, so any health-driven transitions never surface in the API response. The domain already provides SourceEndpoint.GetEffectiveState() which consults both fields and normalises legacy strings — the converter just wasn't using it.

Fix

Two small changes:

  1. internal/adapter/registry/profile/parsers.go — In openAIParser.Parse, set modelInfo.Size = 1 as a sentinel for any successfully-discovered model. For OpenAI-compatible backends, presence in the /v1/models response IS the availability signal — these servers only list models that are loaded and ready to serve. MapModelState then returns "available" via the existing modelSize > 0 branch.

  2. internal/adapter/converter/unified_converter.go — Replace State: ep.State with State: string(ep.GetEffectiveState()) so the converter consults both the typed state machine and the legacy string field, with the existing fallback semantics defined in SourceEndpoint.GetEffectiveState().

Why these changes are safe

  • The Size sentinel is only consumed by MapModelState (for the unknown/available branch) and by parameter-count estimation; neither makes behavioural assumptions about absolute byte values that a sentinel of 1 would violate.
  • GetEffectiveState() already exists, is used in tests, and falls through to ModelStateUnknown when neither field has data — so the previous behaviour is preserved for genuinely unknown endpoints.
  • No schema change, no migration, no config flag.

Test plan

  • go test ./... — all packages pass (28 test packages, zero failures)
  • Built local Docker image, ran against production-like config with vLLM, llama-cpp, and Infinity endpoints — all three reported state: "available" after the patch (state: "unknown" before)
  • Reviewer to confirm Ollama backends (which DO populate size/state natively) are unaffected — these flow through ollamaParser, not openAIParser, and the converter change reads GetEffectiveState() which preserves Ollama's "loaded"/"not-loaded" semantics via the existing switch in SourceEndpoint.GetEffectiveState().

Summary by CodeRabbit

  • Bug Fixes
    • Improved endpoint availability state detection for more accurate status reporting across integrations.
    • Fixed model availability recognition for OpenAI-compatible backends during model discovery to prevent models being incorrectly marked as unavailable.

Review Change Stack

…backends

OpenAI-compatible /v1/models endpoints (vllm, llama.cpp, Infinity, etc.)
return only id/object/created/owned_by — no size, no state field. As a
result, openAIParser left model.Size = 0 and metadata["state"] unset, so
MapModelState() always fell through to "unknown" for these backends.
Meanwhile, the unified converter read ep.State (the legacy string set
once at discovery and never transitioned), not ep.GetEffectiveState()
which consults the typed ModelState field populated by the lifecycle
unifier. Combined, endpoints that successfully routed traffic still
reported state: "unknown" forever in /olla/models.

Two-part fix:
- openAIParser: set Size = 1 sentinel. For OpenAI-compat backends,
  presence in the discovery response IS the availability signal — these
  servers only list models that are loaded and ready to serve.
- unified_converter: read string(ep.GetEffectiveState()) instead of
  ep.State, so health-driven transitions and the typed state machine
  both surface in the API response.

All tests pass: go test ./...

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@petersimmons1972 petersimmons1972 requested a review from thushan as a code owner May 20, 2026 18:35
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Walkthrough

The PR modifies model discovery and availability handling in two coordinated places. The OpenAI parser now assigns a sentinel Size: 1 value to prevent models from being treated as unavailable, and the unified converter shifts to deriving endpoint state from a typed lifecycle method rather than a legacy string field.

Changes

Model availability state normalisation

Layer / File(s) Summary
OpenAI parser sentinel size
internal/adapter/registry/profile/parsers.go
openAIParser.Parse sets Size: 1 when building domain.ModelInfo for OpenAI-compatible models. Comments note that these backends lack size/state reporting and rely on presence as the availability signal.
Converter state derivation
internal/adapter/converter/unified_converter.go
convertModel now populates endpoint availability.state from ep.GetEffectiveState() (stringified) instead of the legacy ep.State field, with added documentation describing the typed lifecycle transition and normalisation of legacy values.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main fix: surfacing availability state for OpenAI-compatible backends, which directly aligns with the primary changes addressing the state reporting bug.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/adapter/converter/unified_converter.go`:
- Around line 77-85: convertModel builds availability using
ep.GetEffectiveState() but matchesAvailabilityFilter still reads the legacy
ep.State, causing inconsistent filtering vs the returned availability.State;
update matchesAvailabilityFilter to use ep.GetEffectiveState() (or the
normalized typed value it returns) instead of ep.State so the
/olla/models?available=... filter aligns with the availability entries produced
by convertModel; locate matchesAvailabilityFilter and change its state-check
logic to call ep.GetEffectiveState() (or compare against the same enum/string
used when creating EndpointStatus) so both filtering and payload use the same
effective state source.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d6f7ebf1-17f6-42bf-9e29-c7b6898716bf

📥 Commits

Reviewing files that changed from the base of the PR and between 6d6ac4d and 1017200.

📒 Files selected for processing (2)
  • internal/adapter/converter/unified_converter.go
  • internal/adapter/registry/profile/parsers.go

Comment on lines +77 to +85
// Use GetEffectiveState() rather than ep.State directly: the lifecycle
// unifier updates the typed ModelState field, while ep.State (the legacy
// string) is only set at discovery time and never transitions. Reading
// the effective state ensures health-driven transitions surface in the
// API response. GetEffectiveState() also normalises legacy string values
// ("loaded", "not-loaded", "available") to the typed enum.
availability = append(availability, EndpointStatus{
Endpoint: ep.EndpointName, // Use endpoint name instead of URL
State: ep.State,
State: string(ep.GetEffectiveState()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep availability filtering aligned with effective state

convertModel now uses effective state, but matchesAvailabilityFilter still checks legacy state (ep.State) at Line 172. This can produce inconsistent /olla/models?available=... results versus the availability.state returned in the payload.

Suggested fix
 func matchesAvailabilityFilter(model *domain.UnifiedModel, available *bool) bool {
 	if available == nil {
 		return true
 	}

 	isAvailable := false
 	for _, ep := range model.SourceEndpoints {
-		if ep.State == "loaded" {
+		if string(ep.GetEffectiveState()) == "loaded" {
 			isAvailable = true
 			break
 		}
 	}
 	return *available == isAvailable
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use GetEffectiveState() rather than ep.State directly: the lifecycle
// unifier updates the typed ModelState field, while ep.State (the legacy
// string) is only set at discovery time and never transitions. Reading
// the effective state ensures health-driven transitions surface in the
// API response. GetEffectiveState() also normalises legacy string values
// ("loaded", "not-loaded", "available") to the typed enum.
availability = append(availability, EndpointStatus{
Endpoint: ep.EndpointName, // Use endpoint name instead of URL
State: ep.State,
State: string(ep.GetEffectiveState()),
func matchesAvailabilityFilter(model *domain.UnifiedModel, available *bool) bool {
if available == nil {
return true
}
isAvailable := false
for _, ep := range model.SourceEndpoints {
if string(ep.GetEffectiveState()) == "loaded" {
isAvailable = true
break
}
}
return *available == isAvailable
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/adapter/converter/unified_converter.go` around lines 77 - 85,
convertModel builds availability using ep.GetEffectiveState() but
matchesAvailabilityFilter still reads the legacy ep.State, causing inconsistent
filtering vs the returned availability.State; update matchesAvailabilityFilter
to use ep.GetEffectiveState() (or the normalized typed value it returns) instead
of ep.State so the /olla/models?available=... filter aligns with the
availability entries produced by convertModel; locate matchesAvailabilityFilter
and change its state-check logic to call ep.GetEffectiveState() (or compare
against the same enum/string used when creating EndpointStatus) so both
filtering and payload use the same effective state source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant