feat(review): failures and live multi-agent progress more reliable by peyton-alt · Pull Request #1167 · entireio/cli

peyton-alt · 2026-05-08T21:20:10Z

https://entire.io/gh/entireio/cli/trails/341

Summary

This PR makes entire review failures and live multi-agent progress more reliable and legible:

Surface real process failures live - run.go and run_multi.go emit a synthetic RunError when proc.Wait() returns a real process failure, so the dashboard updates immediately instead of waiting for final RunFinished.
Do not treat cancellation as failure - synthetic RunError emission is skipped when the context is cancelled or the wait error is cancellation/deadline related, so Ctrl+C does not leave rows marked failed.
Improve failed-row previews - failed dashboard rows show actionable error text in PREVIEW, using the first stderr line for *ProcessError and generic error text otherwise.
Improve post-run failure output - captured stderr is rendered as a fenced code block, and the synthetic RunError is not double-printed in the dump.
Fix live dashboard rendering - the review TUI uses alt screen for the live dashboard, measures the terminal before resize messages arrive, clamps dashboard rows to the current width, and stops spinner/tick redraws once the run is finished.
Restore token visibility - review summaries and per-agent completion events are enriched from review session state/transcripts so token totals appear during multi-agent runs and in the final dashboard.

Background

User-reported issues:

A fast-failing agent could show ✓ done or ✗ failed with an empty PREVIEW while other agents continued running.
Multi-line stderr was hard to read after dismissal because it was jammed into an inline-code span.
Narrowing the terminal during live review corrupted the inline dashboard in scrollback.
Multi-agent token totals were sometimes blank even though token data existed in the review sessions.

This change keeps failure information visible during the run, moves the live dashboard back to alt screen to avoid scrollback reflow corruption, and hydrates token totals from the persisted review session state when the agent stream itself did not emit tokens.

Coverage

Failure handling now covers non-zero process exits without misclassifying cancellation:

Case	Behavior
Auth/API key invalid	Live `✗ failed` with stderr headline in PREVIEW
Network timeout	Live `✗ failed` with error preview
Rate limit/quota/model errors	Live `✗ failed` with error preview
Multi-line stderr	Full stderr in fenced block post-run
Parser-emitted `RunError`	Still treated as failed
Ctrl+C / context cancellation	Cancelled, not synthetic failed
Clean exit with no output	Unchanged
Process hang	Unchanged cancellation path

What this PR does NOT do

Does not add full wrapping/scrolling for Ctrl+O drill-in events; dashboard preview is improved, but detail event lines still use existing width behavior.
Does not detect clean-exit silent success where an agent exits 0 with no narrative.
Does not change review scope/base-branch detection.

Test plan

go test ./cmd/entire/cli/review -run 'Test(DumpSink|TUIModel|Run_|RunMulti|HydrateReview|ReviewSummaryTokenEnricher|WarnManifest|WritePostReviewManifest|BuildLocalReviewManifest)' -count=1
go test ./cmd/entire/cli/review ./cmd/entire/cli/agent/codex ./cmd/entire/cli/agent/claudecode ./cmd/entire/cli/agent/geminicli -run 'Test(DumpSink|TUIModel|Run_|RunMulti|HydrateReview|ReviewSummaryTokenEnricher|WarnManifest|WritePostReviewManifest|BuildLocalReviewManifest|Codex|Claude|Gemini|Discover|Token|Parser|Parse)' -count=1
mise run lint
Manual smoke: live review dashboard uses alt screen; narrowing the terminal no longer leaves repeated AGENT STATUS... fragments in normal scrollback.
Manual smoke: multi-agent token totals appear when review session token data exists.

Tests added / updated

Failure preview and dump formatting tests for ProcessError stderr.
Synthetic RunError tests for single and multi-agent process failures.
Cancellation guard tests to ensure cancelled runs do not emit synthetic failure events.
TUI alt-screen, measured-width, finished-redraw, and final-token-sync tests.
Token hydration tests from review session state and transcript fallback.

Note

Medium Risk
Changes review orchestrator event emission, live TUI rendering behavior, and token enrichment. The behavior is localized to entire review, with focused coverage for failure, cancellation, width, alt-screen, and token hydration paths.

Three coordinated changes that make agent failures legible to the user instead of silently displayed as "✓ done" or buried inside an inline-code span: 1. run.go and run_multi.go: when proc.Wait() returns a non-nil error, emit a synthetic RunError event into the live sink stream. Previously the wait error only reached sinks via RunFinished — which fires once at the end of all agents. In a multi-agent run, an agent that exits non-zero in the first second would render as "✓ done" (from the parser's typical clean-EOF Finished{Success:true}) for the entire duration of any other still-running agent. The synthetic event flips the row to "✗ failed" within milliseconds of the failure, while other agents continue normally. 2. tui_model.go: when an agent has failed and row.err is set, render the error in the dashboard PREVIEW column instead of leaving it blank. The new formatErrorPreview helper strips wrapper noise: for *ProcessError it returns the first non-empty stderr line (the agent CLI's own headline message), since the agent name is already shown in column 1 and "✗ failed" in column 2 — no need to repeat them. For other error types it returns err.Error() verbatim, since they don't have wrappers worth stripping. 3. dump.go: when run.Err is a *ProcessError carrying captured stderr, render the failure block as a header line ("**Failed:** `agent` exited (`exit status N`). Stderr:") followed by the full stderr in a fenced code block on its own. Multi-line stderr (auth errors, stack traces, retry hints) reads cleanly instead of getting collapsed inside an inline-code span. Generic errors keep the inline rendering. Synthetic RunError events whose Err matches run.Err are skipped from the blockquote loop to avoid double- printing the same error in adjacent output blocks. Captures any non-zero process exit, not just one error class: auth failures, network timeouts, rate limits, license errors, segfaults, permission errors, OOM kills, etc. — anything where the agent CLI exits non-zero gets the same treatment. Limitations (worth follow-up tickets): - Live preview is still subject to terminal-width truncation by truncateDisplayWidth (with `…` affordance). Full text always available in the post-run dump. - Ctrl+O drill-in still renders each event as a single truncated line via eventLine — the new error preview is more readable in the main dashboard but the detail view limitation is unchanged. Separate ticket for proper viewport scrolling/wrapping in tui_detail.go. - "Silent success" (agent exits 0 with empty narrative) is not caught by this change — the trigger is non-zero exit. Separate detection logic would be needed to flag empty-output successes. Tests added (TDD, every test failed before implementation): - TestDumpSink_FailedAgentWithProcessErrorRendersStderrAsCodeFence - TestDumpSink_DoesNotDoublePrintSyntheticRunErrorMatchingRunErr - TestTUIModel_DashboardShowsErrorPreviewForFailedAgent - TestTUIModel_DashboardErrorPreviewYieldsToAssistantTextBeforeFailure - TestTUIModel_DashboardErrorPreviewStripsProcessErrorWrapper - TestTUIModel_DashboardErrorPreviewFallsBackToErrStringForNonProcessError - TestRun_EmitsSyntheticRunErrorWhenWaitErrIsNonNil - TestRun_DoesNotEmitSyntheticRunErrorOnCleanExit - TestRunMulti_EmitsSyntheticRunErrorWhenAgentWaitErrIsNonNil One existing test updated to reflect the new event-count contract: - TestRunMulti_OneSucceedsOneFails: failing agent now produces one extra event (the synthetic RunError) compared to before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 8ffa23cd3901

Copilot

Pull request overview

This PR improves how agent failures are surfaced during entire review runs by emitting failure events into the live stream (so the TUI updates immediately) and by rendering clearer failure previews and post-run dumps, especially for multi-line stderr.

Changes:

Emit a synthetic RunError event when proc.Wait() returns an error (single- and multi-agent orchestrators) so live sinks (TUI/dump) see failures immediately.
Show a failure’s error message in the TUI dashboard PREVIEW column (with special formatting for *ProcessError to prefer the first stderr line).
Render *ProcessError stderr as a fenced code block in the post-run dump, and avoid double-printing the synthetic RunError.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
cmd/entire/cli/review/tui_model.go	Shows a concise error preview in the dashboard for failed rows; adds `formatErrorPreview`.
cmd/entire/cli/review/tui_model_test.go	Adds tests covering dashboard error preview behavior and formatting.
cmd/entire/cli/review/run.go	Emits a synthetic `RunError` after `Wait()` to surface process-exit failures in the live stream.
cmd/entire/cli/review/run_test.go	Adds tests asserting synthetic `RunError` emission on `Wait()` error and absence on clean exit.
cmd/entire/cli/review/run_multi.go	Emits a synthetic `RunError` per agent after `Wait()` in multi-agent runs.
cmd/entire/cli/review/run_multi_test.go	Updates event-count expectations and adds a test for multi-agent synthetic `RunError` emission.
cmd/entire/cli/review/dump.go	Improves failed-run dump formatting (fenced stderr) and skips duplicate synthetic errors.
cmd/entire/cli/review/dump_test.go	Adds tests for fenced stderr rendering and duplicate suppression.

Entire-Checkpoint: 103a54950db4

…gent-errors

A panic in EnrichAgentRun previously crashed the per-agent forwarding goroutine, leaking goroutines and aborting the run after agents had already done their work. callEnrichAgentRun now wraps the call with defer/recover; on panic, no synthetic Tokens event is emitted and the run continues to RunFinished. Documents the EnrichSummary/EnrichAgentRun contract on RunConfig so future implementers can find: which goroutine, when called, what fields are actually consumed (Tokens-only for EnrichAgentRun), and the must-not-block / must-not-panic rules. Without these docs, callers hit footguns: setting Err on the AgentRun and wondering why it's silently dropped, or assuming EnrichSummary has the same panic safety as EnrichAgentRun (it does not — caller responsibility, by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 76636eb4103a

Two operational-visibility fixes for the silent-failure paths around the new failure-surfacing work in this PR: 1. reviewTokenUsageForSession returned nil on three distinct error paths (agent registry lookup, transcript read) with no log. A user seeing a blank TOKENS column had no way to triage. Adds logging.Debug at each silent-nil branch with session id and the underlying error — keeps the user-visible behavior identical (still nil) but makes the cause findable with ENTIRE_LOG_LEVEL=debug. 2. formatErrorPreview picked the "first non-empty stderr line" before stripping ANSI. Agents like codex and claude-code emit colored stderr banners whose first line is escape codes only; TrimSpace doesn't drop those, so we'd surface the chrome ("\x1b[31m\x1b[0m") and hide the actual error message on subsequent lines. Strips ANSI per-line before the empty check using the existing review-package stripANSI helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1bdcc2e089e2

Copilot AI review requested due to automatic review settings May 8, 2026 21:20

peyton-alt requested a review from a team as a code owner May 8, 2026 21:20

Copilot started reviewing on behalf of peyton-alt May 8, 2026 21:20 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread cmd/entire/cli/review/run.go Outdated

Comment thread cmd/entire/cli/review/run_multi.go Outdated

peyton-alt added 4 commits May 8, 2026 18:00

Fix review live dashboard rendering

60c3630

Entire-Checkpoint: 103a54950db4

Fix review TUI start error preview

e7461e6

Address review dashboard feedback

596f9f8

Merge remote-tracking branch 'origin/main' into feat/review-surface-a…

22a03e7

…gent-errors

peyton-alt mentioned this pull request May 12, 2026

feat(review): improve drill-in scrolling and post-run access #1184

Open

4 tasks

peyton-alt and others added 3 commits May 12, 2026 00:18

Merge remote-tracking branch 'origin/main' into pr-1167-review

eac9c9b

peyton-alt changed the title ~~feat(review): surface agent failures in live dashboard and post-run dump~~ feat(review): failures and live multi-agent progress more reliable May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(review): failures and live multi-agent progress more reliable#1167

feat(review): failures and live multi-agent progress more reliable#1167
peyton-alt wants to merge 8 commits into
mainfrom
feat/review-surface-agent-errors

peyton-alt commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

peyton-alt commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Coverage

What this PR does NOT do

Test plan

Tests added / updated

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

peyton-alt commented May 8, 2026 •

edited

Loading