Reconcile registry model states on auth changes by destinoantagonista-wq · Pull Request #2121 · router-for-me/CLIProxyAPI

destinoantagonista-wq · 2026-03-13T19:44:14Z

Add Manager.ReconcileRegistryModelStates to clear stale per-model runtime failures for models currently registered in the global model registry. The method finds models supported for an auth, resets non-clean ModelState entries, updates aggregated availability, persists changes, and pushes a snapshot to the scheduler. Introduce modelStateIsClean helper to determine when a model state needs resetting. Call ReconcileRegistryModelStates from Service paths that register/refresh models (applyCoreAuthAddOrUpdate and refreshModelRegistrationForAuth) to keep the scheduler and global registry aligned after model re-registration.

gemini-code-assist · 2026-03-13T19:44:29Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a mechanism to maintain consistency between the scheduler and the global model registry, particularly after authentication-related model re-registrations. By actively reconciling model states, it prevents scenarios where models might appear available in the registry but remain blocked by the scheduler due to outdated failure states, thereby improving the reliability and responsiveness of model availability.

Highlights

New reconciliation logic: Introduced Manager.ReconcileRegistryModelStates to clear stale per-model runtime failures for models registered in the global model registry, ensuring alignment between the scheduler and registry after model re-registration.
State cleanliness check: Added a modelStateIsClean helper function to determine when a model state requires resetting.
Integration into service paths: Integrated calls to ReconcileRegistryModelStates within applyCoreAuthAddOrUpdate and refreshModelRegistrationForAuth service paths to proactively reconcile model states upon auth changes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

sdk/cliproxy/auth/conductor.go
- Added ReconcileRegistryModelStates method to clear stale model states.
- Added modelStateIsClean helper function to check if a model state is clean.
sdk/cliproxy/service.go
- Called s.coreManager.ReconcileRegistryModelStates in applyCoreAuthAddOrUpdate.
- Called s.coreManager.ReconcileRegistryModelStates in refreshModelRegistrationForAuth (twice).

Activity

No human activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a reconciliation mechanism for model states in the auth manager to prevent stale failure states from blocking available models. The implementation is generally sound, but I've identified a potential data inconsistency issue due to an ignored error during persistence, and a minor logic simplification that would improve code clarity. The new ReconcileRegistryModelStates function is correctly integrated into the service's auth update and refresh flows.

gemini-code-assist · 2026-03-13T19:46:26Z

sdk/cliproxy/auth/conductor.go

+				auth.Status = StatusActive
+			}
+			auth.UpdatedAt = now
+			_ = m.persist(ctx, auth)


The error returned by m.persist(ctx, auth) is being ignored. If persistence fails, the in-memory state of the auth manager will be out of sync with the backing store. This could lead to inconsistent behavior or loss of state changes after a service restart. The error should be handled, for example by logging it.

if err := m.persist(ctx, auth); err != nil { logEntryWithRequestID(ctx).WithField("auth_id", auth.ID).Warnf("failed to persist auth changes during model state reconciliation: %v", err) }

gemini-code-assist · 2026-03-13T19:46:26Z

sdk/cliproxy/auth/conductor.go

+			baseModel := canonicalModelKey(modelKey)
+			if baseModel == "" {
+				baseModel = strings.TrimSpace(modelKey)
+			}


The logic to determine baseModel has a potentially confusing fallback. If canonicalModelKey(modelKey) returns an empty string, it falls back to strings.TrimSpace(modelKey). The supported map is keyed by canonical model keys. If canonicalModelKey fails (returns empty), it's likely that the model key is invalid or not in a form that can be canonicalized. Falling back to the raw (but trimmed) modelKey will likely result in a lookup miss in the supported map anyway if the key contains suffixes or other non-canonical parts. This makes the code harder to reason about. It would be clearer to skip reconciliation for model keys that cannot be canonicalized.

Suggested change

baseModel := canonicalModelKey(modelKey)

if baseModel == "" {

baseModel = strings.TrimSpace(modelKey)

}

baseModel := canonicalModelKey(modelKey)

if baseModel == "" {

continue

}

xkonjin

Code Review

Summary: Clean and well-motivated fix. The reconciliation logic correctly handles the case where models reappear in the registry after a refresh but the scheduler still has stale failure states for them. A few issues worth addressing:

🔴 Bug: `persist` error silently swallowed

_ = m.persist(ctx, auth)

The persist call is inside the lock section and the error is discarded entirely. If persistence fails (e.g. disk I/O, serialization error), the in-memory state gets reset but the on-disk state is left stale. On restart, the old failure state would be restored, defeating the reconciliation.

Suggestion: log the error at warning level at minimum, even if you don't want to return it:

if err := m.persist(ctx, auth); err != nil {
    m.log.Warn("reconcile: persist failed", "authID", authID, "err", err)
}

🟡 Race: `m.persist` called under lock

Calling m.persist while holding m.mu is risky if persist does any I/O. This can block other operations that need the lock. The pattern elsewhere in this codebase (e.g. applyCoreAuthAddOrUpdate) typically clones first and persists outside the lock. Consider:

m.mu.Lock()
// ... mutate auth ...
snapshot = auth.Clone()
m.mu.Unlock()

if err := m.persist(ctx, snapshot); err != nil {
    // log
}
// then upsert scheduler

🟡 `context.Background()` in `refreshModelRegistrationForAuth`

The two new calls in service.go use context.Background() instead of propagating a real context. If the calling context has a deadline or cancellation, it won't flow through to persist. Use the caller's context where available, or at least context.WithTimeout.

💡 `canonicalModelKey` double-call on same `modelKey`

baseModel := canonicalModelKey(modelKey)
if baseModel == "" {
    baseModel = strings.TrimSpace(modelKey)
}

If canonicalModelKey already calls strings.TrimSpace internally (which it likely does for normalization), this fallback is a no-op. Worth verifying the invariants — if canonicalModelKey can legitimately return empty for a valid key, that case should be documented.

💡 No test coverage for the reconcile path

The PR lacks tests for ReconcileRegistryModelStates. Given the subtlety of the state transitions (quota reset, error reset, status back to Active), a table-driven test covering the main cases (no supported models, already-clean states, mixed clean/dirty, persist error) would add meaningful confidence.

Overall: the fix is correct in principle, but the persist-under-lock + silent error pattern needs addressing before merge.

luispater

Summary
This moves model-state cleanup into the auth re-registration paths, which is the right direction. However, there is still a stale-state hole when the failed model disappears or is renamed during registry refresh.

Key findings

Blocking: ReconcileRegistryModelStates() only resets entries for models still returned by GetModelsForClient(). If the stale failure belongs to a model that was removed or renamed during refresh, the loop skips it, changed stays false, and auth-level availability/status is never recomputed. That leaves stale Unavailable / StatusMessage / LastError on the auth even though the refreshed registry snapshot is healthy. This still affects management output and websocket availability fallback paths.

Test plan

Reviewed the changed paths in sdk/cliproxy/auth/conductor.go and sdk/cliproxy/service.go
Ran go test ./sdk/cliproxy/auth ./sdk/cliproxy/...

Follow-ups

I think this should either prune unsupported ModelStates during reconciliation or recompute aggregated auth state using only the currently supported registry models before persisting.

destinoantagonista-wq · 2026-03-14T14:40:24Z

Updated this PR with the follow-up fixes from review.

What changed:

prune unsupported ModelStates during reconciliation so renamed/removed models cannot keep stale auth-level status behind
recompute and clear aggregated auth availability when reconciliation leaves no active model states
stop swallowing persist errors in the reconcile path; persist failures now log a warning with request_id/auth_id context

Validation:

go test ./sdk/cliproxy/auth ./sdk/cliproxy/...

I did not apply the canonicalModelKey simplification suggestion because canonicalModelKey already returns the trimmed raw model when suffix parsing does not change it, and the current branch intentionally prunes unsupported entries instead of skipping them.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e08f68ed7c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-14T14:49:05Z

sdk/cliproxy/auth/conductor.go

+			if !hasModelError(auth, now) {
+				auth.LastError = nil
+				auth.StatusMessage = ""
+				auth.Status = StatusActive
+			}


Keep disabled auths from being marked active

When ReconcileRegistryModelStates removes or resets stale ModelStates, it unconditionally sets auth.Status = StatusActive if no model errors remain. This path is now invoked from applyCoreAuthAddOrUpdate/refreshModelRegistrationForAuth after model registration, including cases where an auth is intentionally disabled (auth.Disabled == true), so a disabled auth can end up persisted/reporting as active despite still being disabled. That status regression is user-visible in management output and can break workflows that rely on status semantics; this reset should be skipped when the auth is disabled (or already StatusDisabled).

Useful? React with 👍 / 👎.

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

xkonjin reviewed Mar 13, 2026

View reviewed changes

luispater requested changes Mar 14, 2026

View reviewed changes

fix(auth): tighten registry model reconciliation

f09ed25

chore(auth): drop reconcile test file from pr

e08f68e

destinoantagonista-wq requested a review from luispater March 14, 2026 14:46

chatgpt-codex-connector bot reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reconcile registry model states on auth changes#2121

Reconcile registry model states on auth changes#2121
destinoantagonista-wq wants to merge 3 commits intorouter-for-me:mainfrom
destinoantagonista-wq:main

destinoantagonista-wq commented Mar 13, 2026

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

xkonjin left a comment

Uh oh!

luispater left a comment

Uh oh!

destinoantagonista-wq commented Mar 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

destinoantagonista-wq commented Mar 13, 2026

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Code Review

🔴 Bug: persist error silently swallowed

🟡 Race: m.persist called under lock

🟡 context.Background() in refreshModelRegistrationForAuth

💡 canonicalModelKey double-call on same modelKey

💡 No test coverage for the reconcile path

Uh oh!

luispater left a comment

Choose a reason for hiding this comment

Uh oh!

destinoantagonista-wq commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔴 Bug: `persist` error silently swallowed

🟡 Race: `m.persist` called under lock

🟡 `context.Background()` in `refreshModelRegistrationForAuth`

💡 `canonicalModelKey` double-call on same `modelKey`

destinoantagonista-wq commented Mar 14, 2026 •

edited

Loading