Skip to content

Codex runtime: surface precise unhealthy reason to users (rate limit / quota / auth) #467

@zylos01

Description

@zylos01

Summary

When activity-monitor marks the Codex runtime unhealthy, the channel reply to the user should explain the concrete reason instead of a generic unavailable/unhealthy message.

Claude runtime already has dedicated handling for usage-limit style failures. Codex should get an equivalent runtime-specific path for OpenAI/Codex CLI failures.

Goal

Improve Codex unhealthy-state user messaging so channel replies can distinguish at least:

  • rate limit / TPM-RPM style throttling
  • exhausted credits / insufficient_quota
  • auth/login failures
  • other API/network failures that are known but not quota-related

Scope

  • Inspect existing Claude-side behavior and reuse the same activity-monitor / heartbeat surface where possible.
  • Add Codex-specific failure pattern detection/classification.
  • Persist enough reason metadata for the channel-side unhealthy reply to mention the real cause.
  • Keep wording user-facing and actionable.
  • Add tests for the new pattern classification and reply wording.

Notes

  • We can start with pattern matching based on known Codex CLI / OpenAI error strings and refine after collecting real-world traces.
  • Expected external examples include OpenAI/Codex CLI errors around Rate limit reached ... on tokens per min (TPM), insufficient_quota, and login/token exchange failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions