fix(acp): clean up pending + cancel agent on abandoned prompts by brettchien · Pull Request #760 · openabdev/openab

brettchien · 2026-05-06T14:55:06Z

Closes #732.

Problem

Originally reported on Discord: https://discord.com/channels/1491295327620169908/1491365158868619404/1500930306620920040

Prompts running longer than 600 s triggered recv_timeout in adapter.rs:386. The broker printed "Agent stopped responding" and broke out of the recv loop, but:

pending[request_id] was not removed
No session/cancel was sent — the agent kept running
When the agent eventually finished, it emitted a response with the original id. connection.rs:286 found the stale pending entry and forwarded it to the current prompt's notify_tx — causing that prompt to break immediately with empty text_buf → (no response)
The cascade repeated for every subsequent prompt until the backlog drained

Fix (A + B + C)

(A) Replace flat 600 s timeout with a tokio::select! loop:

recv arm — normal path
30 s liveness arm — checks conn.alive() + configurable hard ceiling
Long-running tools no longer trip the timeout; only a dead reader or the hard ceiling abandons the prompt

(B) AcpConnection::abandon_request(request_id) — drops pending[request_id] and best-effort sends session/cancel on every abandon path

(C) Capture request_id from session_prompt() (was discarded as _); skip notifications whose id doesn't match — defense-in-depth if any future abandon path misses abandon_request

Config

New pool.prompt_hard_timeout_secs (default 1800 s = 30 min). Acts as a safety net against runaway sessions while the 30 s liveness check covers the typical case.

Testing

238 existing tests pass (cargo test --bin openab)
cargo clippy -- -D warnings clean
Manual repro: bash -c 'sleep 700 && echo done', wait past 600 s, send follow-up → real reply (no (no response) cascade)
Manual: kill agent mid-prompt → recv loop bails within ~30 s with Agent process died, next prompt works clean

No unit test for abandon_request — no test seam without a real subprocess; covered end-to-end via real ACP backends. session/cancel shape matches existing cancel_session / reset_session in pool.rs (prod-verified).

Refs

fix: notification loop assumes ordered events, bounded prompts, and managed session lifecycle — none hold in production #76 (Assumption 2: prompts always complete)
New @mention in main channel ignored while existing session is waiting for ACP response #307 (sibling symptom)
fix(acp): close notify channel on EOF to prevent stream hang #470 (introduced the 600 s timeout)

🤖 Generated with Claude Code

shaun-agent · 2026-05-06T15:31:29Z

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Title: fix(acp): clean up pending + cancel agent on abandoned prompts
Source: fix(acp): clean up pending + cancel agent on abandoned prompts #760
Status: moved to PR-Screening
Generated at: 2026-05-06T15:31:29.327Z
Discord thread: https://discord.com/channels/1488041051187974246/1501607147782279412

Screening report

## Intent

PR #760 is trying to fix an ACP prompt lifecycle bug where long-running prompts were incorrectly treated as dead after a flat 600-second receive timeout.

The operator-visible problem is that OpenAB reports Agent stopped responding, but does not fully clean up the abandoned request. The stale pending request can later receive an old agent response and incorrectly deliver it into a newer prompt, causing (no response) cascades for subsequent prompts.

Feat

This is a bug fix for ACP request/session handling.

It changes prompt execution behavior so long-running tools are not abandoned just because they exceed 600 seconds. Instead, OpenAB uses a liveness-check loop, abandons requests explicitly when needed, removes stale pending entries, best-effort sends session/cancel, and ignores mismatched response IDs as defense in depth.

It also adds a configurable hard ceiling: pool.prompt_hard_timeout_secs, defaulting to 1800 seconds.

Who It Serves

Primary beneficiaries:

Agent runtime operators
Discord and Slack end users whose prompts depend on ACP agents
Maintainers debugging long-running or stuck ACP sessions
Reviewers responsible for reliability of prompt dispatch and session cleanup

Rewritten Prompt

Fix ACP prompt abandonment so stale responses from timed-out or dead prompts cannot corrupt later prompts.

Replace the flat 600-second receive timeout with a loop that separately handles normal responses, periodic agent liveness checks, and a configurable hard timeout. When a prompt is abandoned for timeout, dead process, dropped channel, or similar failure, remove its pending request entry and best-effort send session/cancel. Preserve the request ID returned by session_prompt() and ensure notifications are only accepted when their response ID matches the active request.

Add configuration for the hard timeout with a safe default, update example config, and verify existing tests and clippy. Add targeted tests if practical; otherwise document why subprocess-backed ACP behavior requires manual or integration coverage.

Merge Pitch

This is worth advancing because it fixes a concrete reliability failure: one abandoned ACP prompt can poison later prompts until the stale backlog clears. The fix addresses both the immediate timeout behavior and the deeper cleanup issue around pending request ownership.

Risk profile is moderate. The change touches prompt dispatch, ACP connection state, config parsing, and process/session cancellation behavior. The main reviewer concern should be whether all abandon paths consistently call the new cleanup path, and whether session/cancel is safe to send for every abandoned request.

Best-Practice Comparison

Relevant OpenClaw principles:

Gateway-owned scheduling: partially relevant. This PR improves gateway-side ownership of prompt lifecycle decisions.
Durable job persistence: not directly relevant. The bug is in live ACP request routing, not persisted jobs.
Isolated executions: relevant in spirit. Stale responses should not affect later prompts.
Explicit delivery routing: highly relevant. Matching response IDs to active request IDs prevents cross-prompt delivery.
Retry/backoff and run logs: not directly addressed. Better logs around abandon/cancel paths may be useful follow-up.

Relevant Hermes Agent principles:

Gateway daemon tick model: partially relevant. The new 30-second liveness check resembles periodic supervision.
File locking to prevent overlap: not relevant unless ACP sessions are persisted or shared across processes.
Atomic writes for persisted state: not relevant.
Fresh session per scheduled run: conceptually relevant. The fix prevents stale session output from contaminating later work, though it does not create fresh sessions per prompt.
Self-contained prompts for scheduled tasks: not relevant to this specific PR.

Overall, the PR aligns most strongly with explicit delivery routing, isolated execution behavior, and gateway-owned lifecycle supervision.

Implementation Options

Conservative option: minimal cleanup fix
Keep the existing timeout shape but ensure every timeout path removes pending[request_id], sends best-effort session/cancel, and drops mismatched response IDs. This is smaller but still leaves long-running valid tools vulnerable to the flat timeout.

Balanced option: accept this PR’s direction
Replace the flat timeout with recv/liveness/hard-timeout supervision, add abandon_request, preserve request IDs, and reject mismatched notifications. This fixes the known cascade while keeping the design local to ACP prompt handling.

Ambitious option: formal ACP request lifecycle manager
Introduce an explicit request state machine with states like pending, active, abandoned, cancel_sent, completed, and failed. Add structured run logs, metrics, integration tests with a fake ACP subprocess, and centralized cancellation semantics across prompt, reset, and session shutdown paths.

Comparison Table

Option	Speed to ship	Complexity	Reliability	Maintainability	User impact	Fit for OpenAB right now
Conservative cleanup only	High	Low	Medium	Medium	Medium	Decent, but incomplete
Balanced PR approach	Medium	Medium	High	Good	High	Best fit
Full lifecycle manager	Low	High	Very high	High if done well	High	Better as follow-up

Recommendation

Advance the balanced approach from this PR, with reviewer focus on abandonment coverage, cancellation safety, and response-ID routing.

This is the right next step because it fixes the observed production failure without forcing a larger ACP lifecycle redesign into the same merge. A good follow-up would be a separate PR adding a fake ACP backend or subprocess test harness so timeout, cancel, stale-response, and dead-agent cases can be covered automatically.

chaodu-agent · 2026-05-06T16:05:48Z

🟢 Four-monk review — no blocking issues

Verdict: Approve. The 3-layer fix (select loop + abandon_request + stale id filtering) is well-structured and addresses the root cause of #732 cleanly.

Highlights

tokio::select! replaces the flat 600s timeout — long-running tools no longer get killed; only a dead reader or the hard ceiling triggers abandon
abandon_request() removes pending[request_id] and best-effort sends session/cancel (confirmed: ACP spec defines this as a client notification, no id, no response expected)
Stale id filtering provides defense-in-depth against any future missed abandon path
None => break path is safe — reader loop already drains all pending entries on EOF

Non-blocking NITs

#	Note	Suggestion
1	`LIVENESS_CHECK_INTERVAL` hardcoded at 30s	Could be configurable for operators; fine as follow-up
2	`session/cancel` response silently dropped if agent replies	Add `trace!` log when id-bearing response has no pending entry — improves observability
3	Hard timeout format shows `0m` if < 60s	Edge case; consider `as_secs()` display or min-value guard in config

Reviewer breakdown

Reviewer	Verdict	Extra observations
超渡法師 (Claude)	🟢	Baseline check confirmed leak path; verified `tokio::select!` race is safe for UnboundedReceiver
普渡法師 (Claude)	🟢	Verified `None => break` assumption (reader drains pending on EOF); flagged format edge case
擺渡法師 (Codex)	🟢	Confirmed `session/cancel` is spec-compliant notification; GitHub checks passing
覺渡法師 (Gemini)	🟢	Praised select loop design; agreed on trace log suggestion

- pool.liveness_check_secs: hoist the recv-loop poll cadence out of a hard-coded const onto PoolConfig so deployments can tune it. Default remains 30s. - adapter: change hard-timeout error message from ({}m) to ({}s) so non-multiple-of-60 ceilings render correctly (e.g. 90s → "(90s)"). - acp/connection: emit a tracing::trace! line when an id-bearing message arrives whose pending entry was already abandoned. Behaviour is unchanged — the adapter recv loop still filters by request_id; this just makes the stale-response path observable at trace level. cargo check + cargo clippy -- -D warnings + cargo test --bin openab all clean (238 passed).

brettchien · 2026-05-06T17:17:46Z

Follow-up — applied all 3 NITs

Thanks for the review. All three NITs addressed in f323bb0.

#	NIT	Resolution
1	`LIVENESS_CHECK_INTERVAL` hardcoded at 30s	Hoisted onto `PoolConfig` as `pool.liveness_check_secs` (default 30, documented in `config.toml.example`). Wired through `AdapterRouter::new`.
2	id-bearing response has no pending entry (stale-id observability)	Added `trace!(request_id = id, "stale id-bearing message after abandon")` at the fall-through path in `acp/connection.rs`. Behaviour unchanged — the message still falls through to subscriber forwarding; the adapter recv loop's `request_id` filter remains the actual safety net. The trace just makes the stale path observable.
3	Hard timeout format shows `0m` if < 60s	Format changed from `({}m)` / `as_secs() / 60` to `({}s)` / `as_secs()`. Renders correctly for any positive ceiling (e.g. `90s` instead of `1m`).

cargo check && cargo clippy -- -D warnings && cargo test --bin openab clean (238 passed).

chaodu-agent

✅ Approve — all NITs addressed

Verified commit f323bb0 resolves the three NITs from my earlier review:

✅ liveness_check_secs now configurable via PoolConfig (was hardcoded 30s)
✅ trace!(request_id = id, "stale id-bearing message after abandon") added for observability
✅ Hard timeout format uses as_secs() — no more 0m edge case

The 3-layer fix (select loop + abandon_request + stale id filtering) is solid. cargo test + clippy clean. Ready to merge.

…bdev#732) The flat 600s recv_timeout in adapter.rs:386 fires "Agent stopped responding" without removing pending[id] or sending session/cancel. The agent keeps running the abandoned prompt and eventually emits its final response with the original id. The reader at connection.rs:284 looks up pending[id], sees the now-stale entry, and forwards the message to the *current* notify_tx subscriber — which belongs to the next prompt. The next prompt's loop sees notification.id.is_some() and breaks immediately with empty text_buf, returning "(no response)". Each new prompt sent before the agent drains its backlog inherits the previous prompt's stale id and the cascade persists. Fix follows the issue's recommended A+B+C: (A) Replace flat 600s timeout with a tokio::select! loop in stream_prompt_blocks. Recv arm + 30s liveness arm. Liveness arm checks conn.alive() (cheap, just !reader_handle.is_finished()) and a configurable hard ceiling. Default ceiling is 30 min via pool.prompt_hard_timeout_secs. Long-running tools no longer trip the timeout — only a dead reader task or the hard ceiling abandon the prompt. (B) Add AcpConnection::abandon_request(request_id) called on every abandon path: drops pending[request_id] so a late response cannot route to a future subscriber, and best-effort writes session/cancel so the agent stops working on a request the broker has given up on. (C) Capture request_id from session_prompt() (was discarded as `_`) and skip notifications whose id doesn't match. Defense-in-depth at the routing layer; complements (B)'s cleanup if any future abandon path forgets to call abandon_request. No unit test for abandon_request — the connection has no test seam without spawning a real subprocess. Behavior is exercised end-to-end via the adapter loop on real ACP backends. Refs: - openabdev#76 (Assumption 2: prompts always complete) - openabdev#307 (sibling: same family, different visible symptom) - openabdev#470 (added the 600s recv timeout this issue exposes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- pool.liveness_check_secs: hoist the recv-loop poll cadence out of a hard-coded const onto PoolConfig so deployments can tune it. Default remains 30s. - adapter: change hard-timeout error message from ({}m) to ({}s) so non-multiple-of-60 ceilings render correctly (e.g. 90s → "(90s)"). - acp/connection: emit a tracing::trace! line when an id-bearing message arrives whose pending entry was already abandoned. Behaviour is unchanged — the adapter recv loop still filters by request_id; this just makes the stale-response path observable at trace level. cargo check + cargo clippy -- -D warnings + cargo test --bin openab all clean (238 passed).

- Add ±liveness_check_secs precision note to prompt_hard_timeout_secs doc - Add JSON-RPC id field to session/cancel in abandon_request Co-authored-by: 超渡法師 <chaodu@openab.dev>

chaodu-agent · 2026-05-06T20:34:35Z

🟢 Review: fix(acp): clean up pending + cancel agent on abandoned prompts

Verdict: LGTM — The fix is correct, well-structured, and addresses the root cause comprehensively.

What this PR solves

The flat 600s recv_timeout fired "Agent stopped responding" without removing pending[request_id] or sending session/cancel. Late responses leaked into subsequent prompts causing (no response) cascades. The three-layer fix (A: liveness loop, B: abandon cleanup, C: stale ID filter) is robust.

🟢 INFO — Done well

Root cause correctly identified and all three vectors closed
tokio::select! loop is idiomatic — no spurious wakeups
abandon_request is fire-and-forget safe (errors swallowed, pending removed first)
Config defaults reasonable (30s liveness, 30min hard ceiling)
Backward-compatible — #[serde(default)] means existing configs work unchanged
Trace-level log for stale messages — good observability without spam

🟡 NIT — Non-blocking

Liveness vs hard timeout clamp: If liveness_check_secs > prompt_hard_timeout_secs, the hard timeout effectively never fires. Consider a startup warning or clamp.
Cancel response noise: abandon_request allocates a new request ID for session/cancel. The response will hit the "stale id" trace log. Harmless but a comment noting this is intentional would help future readers.
std::time::Instant vs tokio::time::Instant: prompt_start uses std::time::Instant inside async context. Fine for elapsed checks, but tokio::time::Instant is preferred for consistency (especially under tokio::time::pause() in tests).
No unit test for stale-ID filter: Acceptable given no test seam without a subprocess. A code comment noting it is covered by manual repro would help.

Baseline verification

Verified against main:

adapter.rs:386 — confirmed flat 600s recv_timeout with no cleanup
connection.rs — confirmed session_prompt() returns (rx, request_id) but caller discarded ID as _
connection.rs:319 — confirmed pending lookup forwards to current subscriber without ID validation

All three gaps are closed by this PR.

Reviewed by 超渡法師 🪬

- adapter::AdapterRouter::new: emit a tracing::warn! when liveness_check_secs >= prompt_hard_timeout_secs, since in that case the hard ceiling can only fire on the next liveness tick and may be effectively bypassed. Operator-visible warning, not a silent clamp. - adapter: switch prompt_start from std::time::Instant to tokio::time::Instant so the timer shares tokio's clock with the tokio::time::sleep in the same select! arm (cohesive with future tokio::time::pause()-based tests). - adapter + acp/connection: extend the stale-id filter / fall-through comments to note that the path is only exercised against a live subprocess and is covered by manual repro, not a unit test. Note: chaodu-agent NIT 2 (cancel response noise) requires no code change. abandon_request emits a JSON-RPC notification (no id field) per the ACP spec, so a spec-compliant agent must not respond, and even a non-compliant reply with no id would not hit the stale-id trace path. PR comment to follow. cargo check + cargo clippy -- -D warnings + cargo test --bin openab all clean (238 passed).

Extract the reader-loop body in AcpConnection::spawn into a free generic function `run_reader_loop<R, W>` so tests can drive it with `tokio::io::duplex()` halves instead of a real subprocess. Production path is unchanged — spawn() now calls `tokio::spawn(run_reader_loop(...))` with the same args. Two new tests cover: - stale-id response forwarded to subscriber when `pending` is empty (the openabdev#732 fall-through path that the adapter recv loop filters by request_id) - matched-id response resolves the pending oneshot AND forwards a copy to the subscriber (regression guard for the dual branch) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

session/cancel carries a fresh JSON-RPC id but is intentionally not registered in `pending`, so the agent's reply lands in the stale-id branch of run_reader_loop and only emits a trace! log. We never wait on the cancel response; the adapter recv loop's request_id filter is the actual safety net against leakage into the next prompt. Doc-only — no behavioural change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

brettchien · 2026-05-07T01:15:07Z

Follow-up — applied all 4 round-2 NITs

Thanks for the second pass. All four NITs addressed across 1a65244 (NITs 1+3), ccef788 (NIT 4 + Option A refactor), and 5183143 (NIT 2 doc clarification).

#	NIT	Resolution	Commit
1	Liveness vs hard timeout clamp	Added `warn!` at startup in `AdapterRouter::new` when `liveness_check_secs >= prompt_hard_timeout_secs` — the user is told the hard ceiling will only fire after the next liveness tick. Not clamped (kept the values authoritative; warning is enough to surface the misconfig).	`1a65244`
2	Cancel response hits stale-id trace — comment that this is intentional	Expanded `abandon_request` doc comment: `session/cancel` carries a fresh JSON-RPC id but is intentionally not registered in `pending`, so the agent's reply lands in the stale-id branch of `run_reader_loop` and only emits `trace!`. We never wait on the cancel response; the adapter recv loop's `request_id` filter is the actual safety net against leakage. No behavioural change.	`5183143`
3	`std::time::Instant` → `tokio::time::Instant`	`prompt_start` in `AdapterRouter::dispatch_loop` switched to `tokio::time::Instant::now()`. Plays well with `tokio::time::pause()` for future tests.	`1a65244`
4	No unit test for stale-id filter	Refactored `AcpConnection::spawn` reader loop body into a free generic `pub(crate) async fn run_reader_loop<R, W>` (bounds `AsyncRead/AsyncWrite + Unpin + Send + 'static`). Production path unchanged — `spawn()` is a single `tokio::spawn(run_reader_loop(...))` call. Added two `tokio::io::duplex()`-driven tests in `reader_loop_tests`: • `stale_id_response_is_forwarded_without_pending_entry` — directly covers the #732 fall-through path • `matched_id_response_resolves_pending_and_forwards` — regression guard for the resolve+forward dual branch	`ccef788`

cargo check && cargo clippy --bin openab -- -D warnings && cargo test --bin openab clean (252 passed, +2 new).

chaodu-agent

✅ Re-approve — all round-2 NITs addressed

Verified final state (commits 1a65244 through 5183143):

✅ Startup warning when liveness_check_secs >= prompt_hard_timeout_secs — operator-visible, not a silent clamp
✅ Cancel stale-id intent documented — abandon_request doc comment clarifies the intentional non-registration in pending
✅ tokio::time::Instant — consistent with the tokio::time::sleep in the same select! arm
✅ Reader loop unit tests — run_reader_loop<R, W> extracted as generic free function; 2 duplex-driven tests cover stale-id forwarding and matched-id resolve+forward

CI green (all 9 checks pass). 252 tests pass. The 3-layer fix is solid and production-ready.

Reviewed by 超渡法師 🪬

chaodu-agent

<@1490365068863606784> Review pass from 擺渡法師.

Requesting changes. I found two issues in the latest follow-up commits.

Blocking: abandon_request() now sends session/cancel with a JSON-RPC id (src/acp/connection.rs:593-598). ACP defines session/cancel as a client notification: no response is expected, and the protocol examples omit id. Existing OpenAB /cancel and reset paths in src/acp/pool.rs also omit id. With the new id, this is a request-shaped message, so spec-compliant behavior is no longer guaranteed and agents may send an extra response that the PR then has to route as stale. Please remove the id and update the doc comment back to notification semantics.

Source: https://agentclientprotocol.com/protocol/draft/schema#sessioncancel

Blocking/config hardening: pool.liveness_check_secs deserializes as a raw u64 and is converted directly into a Duration used by tokio::time::sleep inside the recv loop (src/config.rs:323-329, src/adapter.rs:172-188, src/adapter.rs:423-433). If an operator sets liveness_check_secs = 0, the sleep arm is immediately ready and the loop can spin continuously while the prompt is still under the hard timeout. Please reject zero or clamp it to a sane minimum before constructing the router.

I could not run local Rust checks here because this container does not have cargo installed (cargo: command not found). GitHub CI for check and smoke tests is green, aside from the unrelated check-pending automation failure.

NIT 1: `abandon_request` was sending `session/cancel` with a JSON-RPC id, making it request-shaped. Per ACP spec, `session/cancel` is a client notification (no id, no response expected). Pool-side `cancel_session` and `reset_session` were already notification-style; this aligns `abandon_request` with both spec and existing convention. Doc comment reverted to notification semantics. NIT 2: Reject `pool.liveness_check_secs = 0` in `parse_config`. Zero would make the `tokio::time::sleep` arm in the recv `select!` loop immediately ready, spinning the loop while the prompt is still under the hard timeout.

brettchien · 2026-05-07T06:29:02Z

Follow-up — applied both round-3 blocking NITs

Thanks for the third pass. Both NITs addressed in a162061.

NIT 1 — `abandon_request` sending request-shaped `session/cancel`

Removed the JSON-RPC id field from the session/cancel payload in abandon_request. The call is now notification-style, matching the ACP spec (protocol/draft/schema#sessioncancel) and aligning with the existing pool.rs::cancel_session / reset_session convention.

The doc comment was reverted from the round-2 stale-id-rationale to plain notification semantics. The stale-id forwarding branch in run_reader_loop and the request_id filter in the adapter recv loop are kept as defense-in-depth — they're still useful for any other source of late id-bearing messages, just no longer triggered by abandon_request itself.

NIT 2 — `liveness_check_secs = 0` could spin the recv loop

Added anyhow::ensure!(config.pool.liveness_check_secs > 0, ...) in parse_config (src/config.rs), grouped with the existing max_buffered_messages > 0 / max_batch_tokens > 0 validations. Misconfigured zero now fails fast at startup with a clear message; no clamp / silent rewrite.

cargo check && cargo clippy --bin openab -- -D warnings && cargo test --bin openab clean (252 passed).

chaodu-agent

🟢 Approved

All three layers of the fix are correct and well-tested:

(A) tokio::select! liveness loop replaces the flat 600s timeout — long-running tools survive; dead agents are caught within ~30s
(B) abandon_request() removes pending[request_id] + best-effort session/cancel (notification-style, per ACP spec)
(C) Stale ID filter in the adapter recv loop — defense-in-depth against any future missed abandon path

Contributor addressed all NITs across 3 rounds:

Configurable liveness_check_secs + prompt_hard_timeout_secs
Startup warning when liveness >= hard timeout
tokio::time::Instant for test compatibility
Zero-value validation in parse_config
session/cancel corrected to notification (no id)
Reader loop extracted as generic run_reader_loop with 2 unit tests covering stale-id and matched-id paths

CI green (252 tests + all smoke tests pass). Ship it. 🚀

Reviewed by 超渡法師 🪬

brettchien requested a review from thepagent as a code owner May 6, 2026 14:55

github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 24 hours. pending-screening and removed closing-soon PR missing Discord Discussion URL — will auto-close in 24 hours. labels May 6, 2026

github-actions Bot added the pending-maintainer label May 6, 2026

chaodu-agent added pending-contributor and removed pending-maintainer labels May 6, 2026

github-actions Bot added pending-maintainer and removed pending-contributor labels May 6, 2026

chaodu-agent previously approved these changes May 6, 2026

View reviewed changes

chaodu-agent removed the pending-maintainer label May 6, 2026

brettchien and others added 2 commits May 6, 2026 20:09

chaodu-agent dismissed their stale review via c19371a May 6, 2026 20:10

chaodu-agent force-pushed the fix/issue-732-recv-timeout-cleanup branch from f323bb0 to c19371a Compare May 6, 2026 20:10

fix(acp): add precision doc + id to session/cancel

0a8fc0a

- Add ±liveness_check_secs precision note to prompt_hard_timeout_secs doc - Add JSON-RPC id field to session/cancel in abandon_request Co-authored-by: 超渡法師 <chaodu@openab.dev>

github-actions Bot added the pending-maintainer label May 6, 2026

chaodu-agent added pending-contributor and removed pending-maintainer pending-screening pending-contributor labels May 6, 2026

brettchien and others added 3 commits May 7, 2026 00:18

github-actions Bot added the pending-maintainer label May 7, 2026

chaodu-agent approved these changes May 7, 2026

View reviewed changes

chaodu-agent added pending-contributor and removed pending-maintainer labels May 7, 2026

chaodu-agent requested changes May 7, 2026

View reviewed changes

github-actions Bot added pending-maintainer and removed pending-contributor labels May 7, 2026

chaodu-agent approved these changes May 7, 2026

View reviewed changes

chaodu-agent added pending-contributor and removed pending-maintainer labels May 7, 2026

thepagent approved these changes May 7, 2026

View reviewed changes

thepagent merged commit 205aa8f into openabdev:main May 7, 2026
10 of 11 checks passed

CHC-Agent mentioned this pull request May 13, 2026

Title: Agent self-hang via foreground until-loop blocks dispatch indefinitely (recurring) #807

Closed

CHC-Agent mentioned this pull request May 24, 2026

fix: notification loop assumes ordered events, bounded prompts, and managed session lifecycle — none hold in production #76

Closed

CHC-Agent mentioned this pull request Jun 4, 2026

Bot silently hangs after tool call when codex-acp LLM call produces no response #997

Open

Conversation

brettchien commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix (A + B + C)

Config

Testing

Refs

Uh oh!

shaun-agent commented May 6, 2026

OpenAB PR Screening

Feat

Who It Serves

Rewritten Prompt

Merge Pitch

Best-Practice Comparison

Implementation Options

Comparison Table

Recommendation

Uh oh!

chaodu-agent commented May 6, 2026

🟢 Four-monk review — no blocking issues

Highlights

Non-blocking NITs

Uh oh!

brettchien commented May 6, 2026

Follow-up — applied all 3 NITs

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

✅ Approve — all NITs addressed

Uh oh!

chaodu-agent commented May 6, 2026

🟢 Review: fix(acp): clean up pending + cancel agent on abandoned prompts

What this PR solves

🟢 INFO — Done well

🟡 NIT — Non-blocking

Uh oh!

brettchien commented May 7, 2026

Follow-up — applied all 4 round-2 NITs

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

✅ Re-approve — all round-2 NITs addressed

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

Uh oh!

brettchien commented May 7, 2026

Follow-up — applied both round-3 blocking NITs

NIT 1 — abandon_request sending request-shaped session/cancel

NIT 2 — liveness_check_secs = 0 could spin the recv loop

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

🟢 Approved

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brettchien commented May 6, 2026 •

edited

Loading

NIT 1 — `abandon_request` sending request-shaped `session/cancel`

NIT 2 — `liveness_check_secs = 0` could spin the recv loop