Skip to content

Backend should return structured retry metadata for rate limits #2606

@Al629176

Description

@Al629176

Summary

Backend should debug and fix the rate-limit response path so chat receives structured retry metadata and can show a clear recovery action instead of a vague “brief wait” message.

Problem

When an upstream AI provider rate-limits a request, the app currently shows:

“Your AI provider is rate-limiting requests. This is a transient upstream limit, not a thread-level block — you can retry in this thread.”

and:

“Rate limit exceeded. Please retry after a brief wait.”

Expected behavior: the backend should classify the rate-limit source and return structured metadata the frontend can use, such as provider/source, whether the error is retryable, retry-after timing when available, and whether fallback provider/model routing is possible.

Actual behavior: the user only gets generic copy. There is no concrete retry time, no clear provider/source, no fallback instruction, and no structured recovery action.

Impact: users may think the thread is broken, retry too quickly, or abandon the conversation. This also makes it hard for frontend to build a proper retry/countdown UI because the backend does not appear to expose enough structured detail.

Steps to reproduce:

  1. Trigger an upstream model/provider rate-limit response during chat.
  2. Observe the backend/provider error mapping.
  3. Confirm the chat receives a generic rate-limit message instead of structured retry metadata.
  4. Confirm the UI cannot show a countdown, retry button state, provider/source, or fallback option.

Version / platform: desktop app, screenshot captured May 25, 2026. Exact app version unknown.

Scope (backend)

Backend developer should:

  • Trace which provider/backend layer produces this rate-limit response.
  • Preserve upstream Retry-After or equivalent cooldown metadata when present.
  • Normalize rate-limit errors into a typed response shape for chat/tool execution.
  • Distinguish upstream provider throttling from OpenHuman budget/rate limits.
  • Indicate whether the same thread can retry and whether fallback routing is available.
  • Add backend logs with provider, model/workload, retry metadata, and request correlation ID, without logging secrets or prompt contents.

Frontend follow-up can then use the structured response to render countdown/retry/fallback UI.

Acceptance criteria

  • Backend source identified — The PR documents where the rate-limit response originates and which provider/backend layer produces it.
  • Structured error returned — Rate-limit responses include typed metadata such as retryable, source, provider, retry_after_ms/equivalent, and fallback availability when known.
  • Retry-After preserved — Upstream retry timing is preserved when available instead of being collapsed into “brief wait.”
  • Limit type distinguished — Backend distinguishes upstream provider throttling from OpenHuman budget/rate limits.
  • Same-thread retry supported — Backend does not mark the thread as permanently blocked when the error is transient.
  • Fallback signal added — If another configured provider/model can handle the request, backend exposes that as a fallback option.
  • Regression safety — Backend/unit/integration tests cover rate-limit responses with and without retry metadata, plus OpenHuman budget-limit behavior.
  • Diff coverage ≥ 80% — the fix PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by .github/workflows/coverage.yml).

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions