You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(acp): clean up pending + cancel agent on abandoned prompts (#732)
The flat 600s recv_timeout in adapter.rs:386 fires "Agent stopped responding"
without removing pending[id] or sending session/cancel. The agent keeps
running the abandoned prompt and eventually emits its final response with
the original id. The reader at connection.rs:284 looks up pending[id], sees
the now-stale entry, and forwards the message to the *current* notify_tx
subscriber — which belongs to the next prompt. The next prompt's loop sees
notification.id.is_some() and breaks immediately with empty text_buf,
returning "(no response)". Each new prompt sent before the agent drains its
backlog inherits the previous prompt's stale id and the cascade persists.
Fix follows the issue's recommended A+B+C:
(A) Replace flat 600s timeout with a tokio::select! loop in stream_prompt_blocks.
Recv arm + 30s liveness arm. Liveness arm checks conn.alive() (cheap,
just !reader_handle.is_finished()) and a configurable hard ceiling.
Default ceiling is 30 min via pool.prompt_hard_timeout_secs. Long-running
tools no longer trip the timeout — only a dead reader task or the hard
ceiling abandon the prompt.
(B) Add AcpConnection::abandon_request(request_id) called on every abandon
path: drops pending[request_id] so a late response cannot route to a
future subscriber, and best-effort writes session/cancel so the agent
stops working on a request the broker has given up on.
(C) Capture request_id from session_prompt() (was discarded as `_`) and
skip notifications whose id doesn't match. Defense-in-depth at the
routing layer; complements (B)'s cleanup if any future abandon path
forgets to call abandon_request.
No unit test for abandon_request — the connection has no test seam without
spawning a real subprocess. Behavior is exercised end-to-end via the
adapter loop on real ACP backends.
Refs:
- #76 (Assumption 2: prompts always complete)
- #307 (sibling: same family, different visible symptom)
- #470 (added the 600s recv timeout this issue exposes)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0 commit comments