Summary
When activity-monitor marks the Codex runtime unhealthy, the channel reply to the user should explain the concrete reason instead of a generic unavailable/unhealthy message.
Claude runtime already has dedicated handling for usage-limit style failures. Codex should get an equivalent runtime-specific path for OpenAI/Codex CLI failures.
Goal
Improve Codex unhealthy-state user messaging so channel replies can distinguish at least:
- rate limit / TPM-RPM style throttling
- exhausted credits /
insufficient_quota
- auth/login failures
- other API/network failures that are known but not quota-related
Scope
- Inspect existing Claude-side behavior and reuse the same
activity-monitor / heartbeat surface where possible.
- Add Codex-specific failure pattern detection/classification.
- Persist enough reason metadata for the channel-side unhealthy reply to mention the real cause.
- Keep wording user-facing and actionable.
- Add tests for the new pattern classification and reply wording.
Notes
- We can start with pattern matching based on known Codex CLI / OpenAI error strings and refine after collecting real-world traces.
- Expected external examples include OpenAI/Codex CLI errors around
Rate limit reached ... on tokens per min (TPM), insufficient_quota, and login/token exchange failures.
Summary
When
activity-monitormarks the Codex runtime unhealthy, the channel reply to the user should explain the concrete reason instead of a generic unavailable/unhealthy message.Claude runtime already has dedicated handling for usage-limit style failures. Codex should get an equivalent runtime-specific path for OpenAI/Codex CLI failures.
Goal
Improve Codex unhealthy-state user messaging so channel replies can distinguish at least:
insufficient_quotaScope
activity-monitor/ heartbeat surface where possible.Notes
Rate limit reached ... on tokens per min (TPM),insufficient_quota, and login/token exchange failures.