Problem
Claude Code's Whirlpool (context compaction) can hang indefinitely, leaving the agent completely unresponsive. The current activity monitor does not detect this state — the Claude process is still running, tmux session exists, so all liveness checks pass. But the agent processes zero messages.
Incident: zylos305 (2026-04-14 ~13:16 UTC)
Timeline:
- Agent was actively working (code edits, test runs, GitHub issue creation on coco-dashboard)
- Context reached 82% (control_queue warnings at 71% and 82%)
- Claude entered Whirlpool compaction at ~13:16 UTC
- Whirlpool hung — over 14 minutes with no progress (
Whirlpooling… (13m 57s · ↓ 2.6k tokens · thought for 2s))
- 3 messages queued (2 scheduled tasks + 1 Lark group message), none processed
- Agent remained in this state until manual intervention
Environment snapshot:
- Claude Code 2.1.107
- Claude process PID 2298112, RSS 634MB, running for 8h16m
- VM: cocoai-zylos305-1b (7.8Gi RAM, 73% disk)
- Last clean exit was 2026-04-04 (10 days of continuous operation with /clear-based session resets)
- 14 IN vs 5 OUT messages in last hour at time of detection
- state.md was large with extensive test environment context
Tmux pane at time of detection:
● Bash(cd ~/zylos/workspace/coco-dashboard && gh issue create ...)
⎿ https://github.com/coco-xyz/coco-dashboard/issues/1329
✶ Whirlpooling… (13m 57s · ↓ 2.6k tokens · thought for 2s)
⎿ Tip: Use /btw to ask a quick side question...
❯ [Scheduled Task: task-mnv2xy40-p20fiz] ...
❯ [Lark GROUP:coco-dashboard研发] 3deca688 said: ...
❯ [Scheduled Task: task-mn31ohyb-hiq20f] ...
Expected Behavior
The activity monitor should detect that Claude is stuck (not producing output despite having queued input) and automatically recover — either by aborting the compaction or by killing and restarting the Claude process.
Proposed Detection Approaches
Option A: Tmux screen scrape (targeted)
- Periodically capture tmux pane text
- Detect
Whirlpooling or Compacting keywords
- If the text persists for >5 minutes, consider it hung
- Recovery: kill Claude process → activity monitor auto-restarts
Option B: Message flow monitoring (general)
- Monitor c4.db
conversations table
- If last IN is newer than last OUT by >N minutes while Claude process is running, flag as stuck
- This would catch whirlpool hangs AND other hang scenarios (API timeouts, etc.)
- More robust but may need tuning to avoid false positives during legitimate long-running tasks
Option C: Combine both
- Use Option B as the general detector with a longer threshold (e.g., 15-20 min)
- Use Option A as a fast-path for known hang patterns (e.g., 5 min for whirlpool)
Impact
- Agent becomes completely unresponsive during a whirlpool hang
- All queued messages (user messages + scheduled tasks) pile up
- No automatic recovery — requires manual SSH intervention
- Users perceive the bot as "dead" with no explanation
Workaround
Manual: SSH into VM → kill Claude PID → activity monitor auto-restarts → queued messages redelivered in new session.
Problem
Claude Code's Whirlpool (context compaction) can hang indefinitely, leaving the agent completely unresponsive. The current activity monitor does not detect this state — the Claude process is still running, tmux session exists, so all liveness checks pass. But the agent processes zero messages.
Incident: zylos305 (2026-04-14 ~13:16 UTC)
Timeline:
Whirlpooling… (13m 57s · ↓ 2.6k tokens · thought for 2s))Environment snapshot:
Tmux pane at time of detection:
Expected Behavior
The activity monitor should detect that Claude is stuck (not producing output despite having queued input) and automatically recover — either by aborting the compaction or by killing and restarting the Claude process.
Proposed Detection Approaches
Option A: Tmux screen scrape (targeted)
WhirlpoolingorCompactingkeywordsOption B: Message flow monitoring (general)
conversationstableOption C: Combine both
Impact
Workaround
Manual: SSH into VM → kill Claude PID → activity monitor auto-restarts → queued messages redelivered in new session.