Problem
Claude Code built-in tools (WebFetch, WebSearch) have no timeout mechanism. When a fetch request hangs indefinitely, the entire Claude Code turn is blocked — no messages can be processed until the hung tool call resolves or is manually interrupted.
Incidents
- the9bit_cocobot (2026-04-08): WebSearch infinite wait caused bot to become unresponsive for hours. Heartbeat was disabled, guardian only monitored process death (not functional liveness), ProcSampler blind to Codex internals. [Post-mortem R5]
- wlnhxb (2026-04-08~09):
Fetch(https://webcache.googleusercontent.com/...) hung with no completion status (Fetching… shown indefinitely). Ideating phase showed 4h 23m wall time. 6 messages queued but unprocessed. Resolved by manually pressing Escape in tmux.
Root Cause
WebFetch and WebSearch are Claude Code built-in tools — we cannot add timeouts to them directly.
- Claude Code process remains alive (not crashed), so process-death monitoring (activity-monitor) does not trigger restart.
- Activity-monitor detects IDLE/BUSY state transitions, but a stuck tool call keeps the state frozen (no transitions to detect).
Proposed Solution
Add a max-BUSY-duration timeout (hook or activity-monitor enhancement):
- Monitor continuous BUSY duration. If Claude Code stays BUSY longer than a configurable threshold (e.g., 20-30 minutes), trigger recovery.
- Recovery action: Send
Escape to the tmux session to interrupt the hung tool call, followed by /clear if needed.
- Configuration: Timeout should be configurable (default 30 min). Some tasks legitimately take long, so threshold should be generous.
Implementation Options
- Option A (hook-based — recommended by voya.luo): Add a hook that monitors tool-call duration. If a single tool call exceeds the timeout, send Escape to tmux.
- Option B (activity-monitor enhancement): Extend BUSY-state tracking to detect prolonged BUSY without state transitions and trigger Escape.
Affected Components
activity-monitor — needs max-BUSY-duration detection
- Potentially hooks system — if implementing as a hook
Priority
Medium-High — affects all Claude-runtime VMs. Two incidents in one day across different customers.
References
- the9bit_cocobot post-mortem: R2 (max-BUSY-duration kill), R5 (web search timeout)
- wlnhxb tmux evidence:
* Ideating… (4h 23m 7s · ↓ 138 tokens · thought for 2s)
Problem
Claude Code built-in tools (
WebFetch,WebSearch) have no timeout mechanism. When a fetch request hangs indefinitely, the entire Claude Code turn is blocked — no messages can be processed until the hung tool call resolves or is manually interrupted.Incidents
Fetch(https://webcache.googleusercontent.com/...)hung with no completion status (Fetching…shown indefinitely).Ideatingphase showed 4h 23m wall time. 6 messages queued but unprocessed. Resolved by manually pressing Escape in tmux.Root Cause
WebFetchandWebSearchare Claude Code built-in tools — we cannot add timeouts to them directly.Proposed Solution
Add a max-BUSY-duration timeout (hook or activity-monitor enhancement):
Escapeto the tmux session to interrupt the hung tool call, followed by/clearif needed.Implementation Options
Affected Components
activity-monitor— needs max-BUSY-duration detectionPriority
Medium-High — affects all Claude-runtime VMs. Two incidents in one day across different customers.
References
* Ideating… (4h 23m 7s · ↓ 138 tokens · thought for 2s)