Backend should return structured retry metadata for rate limits

## Summary

Backend should debug and fix the rate-limit response path so chat receives structured retry metadata and can show a clear recovery action instead of a vague “brief wait” message.

## Problem

When an upstream AI provider rate-limits a request, the app currently shows:

> “Your AI provider is rate-limiting requests. This is a transient upstream limit, not a thread-level block — you can retry in this thread.”

and:

> “Rate limit exceeded. Please retry after a brief wait.”

Expected behavior: the backend should classify the rate-limit source and return structured metadata the frontend can use, such as provider/source, whether the error is retryable, retry-after timing when available, and whether fallback provider/model routing is possible.

Actual behavior: the user only gets generic copy. There is no concrete retry time, no clear provider/source, no fallback instruction, and no structured recovery action.

Impact: users may think the thread is broken, retry too quickly, or abandon the conversation. This also makes it hard for frontend to build a proper retry/countdown UI because the backend does not appear to expose enough structured detail.

Steps to reproduce:
1. Trigger an upstream model/provider rate-limit response during chat.
2. Observe the backend/provider error mapping.
3. Confirm the chat receives a generic rate-limit message instead of structured retry metadata.
4. Confirm the UI cannot show a countdown, retry button state, provider/source, or fallback option.

Version / platform: desktop app, screenshot captured May 25, 2026. Exact app version unknown.

## Scope (backend)

Backend developer should:
- Trace which provider/backend layer produces this rate-limit response.
- Preserve upstream `Retry-After` or equivalent cooldown metadata when present.
- Normalize rate-limit errors into a typed response shape for chat/tool execution.
- Distinguish upstream provider throttling from OpenHuman budget/rate limits.
- Indicate whether the same thread can retry and whether fallback routing is available.
- Add backend logs with provider, model/workload, retry metadata, and request correlation ID, without logging secrets or prompt contents.

Frontend follow-up can then use the structured response to render countdown/retry/fallback UI.

## Acceptance criteria

- [ ] **Backend source identified** — The PR documents where the rate-limit response originates and which provider/backend layer produces it.
- [ ] **Structured error returned** — Rate-limit responses include typed metadata such as `retryable`, `source`, `provider`, `retry_after_ms`/equivalent, and fallback availability when known.
- [ ] **Retry-After preserved** — Upstream retry timing is preserved when available instead of being collapsed into “brief wait.”
- [ ] **Limit type distinguished** — Backend distinguishes upstream provider throttling from OpenHuman budget/rate limits.
- [ ] **Same-thread retry supported** — Backend does not mark the thread as permanently blocked when the error is transient.
- [ ] **Fallback signal added** — If another configured provider/model can handle the request, backend exposes that as a fallback option.
- [ ] **Regression safety** — Backend/unit/integration tests cover rate-limit responses with and without retry metadata, plus OpenHuman budget-limit behavior.
- [ ] **Diff coverage ≥ 80%** — the fix PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by [`.github/workflows/coverage.yml`](../../.github/workflows/coverage.yml)).

## Related

- Screenshot: `Screenshot 2026-05-25 at 11.16.05 AM.png`
- Related frontend/user-visible issue: #2364

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend should return structured retry metadata for rate limits #2606

Summary

Problem

Scope (backend)

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Backend should return structured retry metadata for rate limits #2606

Description

Summary

Problem

Scope (backend)

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions