Skip to content

Commit bf75a88

Browse files
authored
fix: bound reliability timeout paths (#6)
1 parent 6d1448d commit bf75a88

15 files changed

Lines changed: 604 additions & 40 deletions

File tree

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ The result is not a magical mind meld. It is a practical workflow: the agents on
4040
- Stale Claude frontend attachments are probed and evicted instead of blocking a new Claude session indefinitely.
4141
- Codex turns have a watchdog fallback, and the viewer/status model now distinguishes idle, busy, stale, and offline agent states.
4242
- Codex resume/fork argument handling, active thread tracking, and app-server port cleanup are more robust, including LISTEN-only and Windows-aware process checks.
43+
- Reliability guards now bound backup-agent timeout cleanup, daemon shutdown steps, and long-running Codex turns with visible ledger/status telemetry.
4344
- The browser viewer remains the **Command Deck**: a read-only, color-coded, latest-first dashboard with clear history controls, task lanes, artifacts, policy state, and connection health.
4445

4546
## What It Helps With
@@ -724,7 +725,7 @@ Business value: the viewer and `ctxrelay status` no longer make a completed or s
724725

725726
Technical shape:
726727

727-
- Codex exposes `idle`, `busy`, `stale`, and `offline` state. A watchdog force-clears a Codex turn only after a configurable silence window, emits a visible forced-completion marker, and lets Claude send again.
728+
- Codex exposes `idle`, `busy`, `stale`, and `offline` state. Watchdogs force-clear a Codex turn after either a configurable silence window or a configurable per-turn wall-clock budget, emit visible markers, and let Claude send again.
728729
- Claude exposes `idle`, `expected`, `stale`, and `offline` state. Claude-owned lanes become stale after a configurable response timeout.
729730
- A new Claude frontend does not blindly replace a live session. The daemon probes the incumbent frontend first; if it responds, the newcomer is rejected, and if it does not, the stale frontend is evicted with close code `4002`.
730731

@@ -911,8 +912,11 @@ The exact runtime state path can be changed with `CONTEXTRELAY_STATE_DIR`. Norma
911912
| `CONTEXTRELAY_CLAUDE_PROBE_TIMEOUT_MS` | `3000` | How long the daemon waits for an attached Claude frontend to answer a liveness probe before evicting it as stale. Set to `0` to disable probe eviction and keep the older reject-only behavior. |
912913
| `CONTEXTRELAY_CLAUDE_RESPONSE_TIMEOUT_MS` | `300000` | How long a Claude-owned active task lane can remain unanswered before the task board marks it stale. |
913914
| `CONTEXTRELAY_CODEX_TURN_IDLE_TIMEOUT_MS` | `300000` | Silence window before a stuck Codex turn is force-cleared and reported as a forced completion. |
915+
| `CONTEXTRELAY_TURN_MAX_MS` | `300000` | Wall-clock budget for a single Codex turn before that turn is cleared from the busy set and reported as a turn watchdog event. Set to `0` to disable this guard. |
914916
| `CONTEXTRELAY_MAX_DEPTH` | `3` | Maximum relay recursion depth. |
915917
| `CONTEXTRELAY_BACKUP_THROTTLE_MS` | `60000` | Minimum delay between backup starts for the same target. |
918+
| `CONTEXTRELAY_BACKUP_KILL_GRACE_MS` | `2000` | Grace period between backup timeout SIGTERM and SIGKILL escalation. |
919+
| `CONTEXTRELAY_DAEMON_SHUTDOWN_STEP_TIMEOUT_MS` | `1500` | Per-step deadline for daemon shutdown cleanup before recording the step as timed out and continuing shutdown. |
916920
| `CONTEXTRELAY_MAX_CONTROL_MESSAGE_BYTES` | `1000000` | Maximum accepted control WebSocket message size. |
917921
| `CONTEXTRELAY_MAX_CONTROL_MESSAGES_PER_MINUTE` | `120` | Per-control-connection rate limit. |
918922
| `CONTEXTRELAY_DAEMON_ENTRY` | bundled daemon | Plugin daemon entry. Overrides require `CONTEXTRELAY_ALLOW_DAEMON_ENTRY_OVERRIDE=1`. |

dist/cli.js

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2867,6 +2867,9 @@ function formatRuntimeEvent(event) {
28672867
event.path ? `path: ${event.path}` : undefined,
28682868
event.method ? `method: ${event.method}` : undefined,
28692869
typeof event.exitCode === "number" ? `exit_code: ${event.exitCode}` : undefined,
2870+
typeof event.durationMs === "number" ? `duration_ms: ${event.durationMs}` : undefined,
2871+
event.requestId ? `request_id: ${event.requestId}` : undefined,
2872+
event.backupTarget ? `backup_target: ${event.backupTarget}` : undefined,
28702873
event.detail
28712874
].filter(Boolean);
28722875
return details.join(`

0 commit comments

Comments
 (0)