Skip to content

feat(review): failures and live multi-agent progress more reliable#1167

Open
peyton-alt wants to merge 8 commits into
mainfrom
feat/review-surface-agent-errors
Open

feat(review): failures and live multi-agent progress more reliable#1167
peyton-alt wants to merge 8 commits into
mainfrom
feat/review-surface-agent-errors

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented May 8, 2026

https://entire.io/gh/entireio/cli/trails/341

Summary

This PR makes entire review failures and live multi-agent progress more reliable and legible:

  1. Surface real process failures live - run.go and run_multi.go emit a synthetic RunError when proc.Wait() returns a real process failure, so the dashboard updates immediately instead of waiting for final RunFinished.
  2. Do not treat cancellation as failure - synthetic RunError emission is skipped when the context is cancelled or the wait error is cancellation/deadline related, so Ctrl+C does not leave rows marked failed.
  3. Improve failed-row previews - failed dashboard rows show actionable error text in PREVIEW, using the first stderr line for *ProcessError and generic error text otherwise.
  4. Improve post-run failure output - captured stderr is rendered as a fenced code block, and the synthetic RunError is not double-printed in the dump.
  5. Fix live dashboard rendering - the review TUI uses alt screen for the live dashboard, measures the terminal before resize messages arrive, clamps dashboard rows to the current width, and stops spinner/tick redraws once the run is finished.
  6. Restore token visibility - review summaries and per-agent completion events are enriched from review session state/transcripts so token totals appear during multi-agent runs and in the final dashboard.

Background

User-reported issues:

  • A fast-failing agent could show ✓ done or ✗ failed with an empty PREVIEW while other agents continued running.
  • Multi-line stderr was hard to read after dismissal because it was jammed into an inline-code span.
  • Narrowing the terminal during live review corrupted the inline dashboard in scrollback.
  • Multi-agent token totals were sometimes blank even though token data existed in the review sessions.

This change keeps failure information visible during the run, moves the live dashboard back to alt screen to avoid scrollback reflow corruption, and hydrates token totals from the persisted review session state when the agent stream itself did not emit tokens.

Coverage

Failure handling now covers non-zero process exits without misclassifying cancellation:

Case Behavior
Auth/API key invalid Live ✗ failed with stderr headline in PREVIEW
Network timeout Live ✗ failed with error preview
Rate limit/quota/model errors Live ✗ failed with error preview
Multi-line stderr Full stderr in fenced block post-run
Parser-emitted RunError Still treated as failed
Ctrl+C / context cancellation Cancelled, not synthetic failed
Clean exit with no output Unchanged
Process hang Unchanged cancellation path

What this PR does NOT do

  • Does not add full wrapping/scrolling for Ctrl+O drill-in events; dashboard preview is improved, but detail event lines still use existing width behavior.
  • Does not detect clean-exit silent success where an agent exits 0 with no narrative.
  • Does not change review scope/base-branch detection.

Test plan

  • go test ./cmd/entire/cli/review -run 'Test(DumpSink|TUIModel|Run_|RunMulti|HydrateReview|ReviewSummaryTokenEnricher|WarnManifest|WritePostReviewManifest|BuildLocalReviewManifest)' -count=1
  • go test ./cmd/entire/cli/review ./cmd/entire/cli/agent/codex ./cmd/entire/cli/agent/claudecode ./cmd/entire/cli/agent/geminicli -run 'Test(DumpSink|TUIModel|Run_|RunMulti|HydrateReview|ReviewSummaryTokenEnricher|WarnManifest|WritePostReviewManifest|BuildLocalReviewManifest|Codex|Claude|Gemini|Discover|Token|Parser|Parse)' -count=1
  • mise run lint
  • Manual smoke: live review dashboard uses alt screen; narrowing the terminal no longer leaves repeated AGENT STATUS... fragments in normal scrollback.
  • Manual smoke: multi-agent token totals appear when review session token data exists.

Tests added / updated

  • Failure preview and dump formatting tests for ProcessError stderr.
  • Synthetic RunError tests for single and multi-agent process failures.
  • Cancellation guard tests to ensure cancelled runs do not emit synthetic failure events.
  • TUI alt-screen, measured-width, finished-redraw, and final-token-sync tests.
  • Token hydration tests from review session state and transcript fallback.

Note

Medium Risk
Changes review orchestrator event emission, live TUI rendering behavior, and token enrichment. The behavior is localized to entire review, with focused coverage for failure, cancellation, width, alt-screen, and token hydration paths.

Three coordinated changes that make agent failures legible to the user
instead of silently displayed as "✓ done" or buried inside an inline-code
span:

1. run.go and run_multi.go: when proc.Wait() returns a non-nil error,
   emit a synthetic RunError event into the live sink stream. Previously
   the wait error only reached sinks via RunFinished — which fires once
   at the end of all agents. In a multi-agent run, an agent that exits
   non-zero in the first second would render as "✓ done" (from the
   parser's typical clean-EOF Finished{Success:true}) for the entire
   duration of any other still-running agent. The synthetic event flips
   the row to "✗ failed" within milliseconds of the failure, while
   other agents continue normally.

2. tui_model.go: when an agent has failed and row.err is set, render
   the error in the dashboard PREVIEW column instead of leaving it
   blank. The new formatErrorPreview helper strips wrapper noise:
   for *ProcessError it returns the first non-empty stderr line
   (the agent CLI's own headline message), since the agent name is
   already shown in column 1 and "✗ failed" in column 2 — no need
   to repeat them. For other error types it returns err.Error()
   verbatim, since they don't have wrappers worth stripping.

3. dump.go: when run.Err is a *ProcessError carrying captured stderr,
   render the failure block as a header line ("**Failed:** `agent`
   exited (`exit status N`). Stderr:") followed by the full stderr
   in a fenced code block on its own. Multi-line stderr (auth errors,
   stack traces, retry hints) reads cleanly instead of getting
   collapsed inside an inline-code span. Generic errors keep the
   inline rendering. Synthetic RunError events whose Err matches
   run.Err are skipped from the blockquote loop to avoid double-
   printing the same error in adjacent output blocks.

Captures any non-zero process exit, not just one error class: auth
failures, network timeouts, rate limits, license errors, segfaults,
permission errors, OOM kills, etc. — anything where the agent CLI
exits non-zero gets the same treatment.

Limitations (worth follow-up tickets):
- Live preview is still subject to terminal-width truncation by
  truncateDisplayWidth (with `…` affordance). Full text always
  available in the post-run dump.
- Ctrl+O drill-in still renders each event as a single truncated line
  via eventLine — the new error preview is more readable in the main
  dashboard but the detail view limitation is unchanged. Separate
  ticket for proper viewport scrolling/wrapping in tui_detail.go.
- "Silent success" (agent exits 0 with empty narrative) is not
  caught by this change — the trigger is non-zero exit. Separate
  detection logic would be needed to flag empty-output successes.

Tests added (TDD, every test failed before implementation):
- TestDumpSink_FailedAgentWithProcessErrorRendersStderrAsCodeFence
- TestDumpSink_DoesNotDoublePrintSyntheticRunErrorMatchingRunErr
- TestTUIModel_DashboardShowsErrorPreviewForFailedAgent
- TestTUIModel_DashboardErrorPreviewYieldsToAssistantTextBeforeFailure
- TestTUIModel_DashboardErrorPreviewStripsProcessErrorWrapper
- TestTUIModel_DashboardErrorPreviewFallsBackToErrStringForNonProcessError
- TestRun_EmitsSyntheticRunErrorWhenWaitErrIsNonNil
- TestRun_DoesNotEmitSyntheticRunErrorOnCleanExit
- TestRunMulti_EmitsSyntheticRunErrorWhenAgentWaitErrIsNonNil

One existing test updated to reflect the new event-count contract:
- TestRunMulti_OneSucceedsOneFails: failing agent now produces one
  extra event (the synthetic RunError) compared to before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8ffa23cd3901
Copilot AI review requested due to automatic review settings May 8, 2026 21:20
@peyton-alt peyton-alt requested a review from a team as a code owner May 8, 2026 21:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves how agent failures are surfaced during entire review runs by emitting failure events into the live stream (so the TUI updates immediately) and by rendering clearer failure previews and post-run dumps, especially for multi-line stderr.

Changes:

  • Emit a synthetic RunError event when proc.Wait() returns an error (single- and multi-agent orchestrators) so live sinks (TUI/dump) see failures immediately.
  • Show a failure’s error message in the TUI dashboard PREVIEW column (with special formatting for *ProcessError to prefer the first stderr line).
  • Render *ProcessError stderr as a fenced code block in the post-run dump, and avoid double-printing the synthetic RunError.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/entire/cli/review/tui_model.go Shows a concise error preview in the dashboard for failed rows; adds formatErrorPreview.
cmd/entire/cli/review/tui_model_test.go Adds tests covering dashboard error preview behavior and formatting.
cmd/entire/cli/review/run.go Emits a synthetic RunError after Wait() to surface process-exit failures in the live stream.
cmd/entire/cli/review/run_test.go Adds tests asserting synthetic RunError emission on Wait() error and absence on clean exit.
cmd/entire/cli/review/run_multi.go Emits a synthetic RunError per agent after Wait() in multi-agent runs.
cmd/entire/cli/review/run_multi_test.go Updates event-count expectations and adds a test for multi-agent synthetic RunError emission.
cmd/entire/cli/review/dump.go Improves failed-run dump formatting (fenced stderr) and skips duplicate synthetic errors.
cmd/entire/cli/review/dump_test.go Adds tests for fenced stderr rendering and duplicate suppression.

Comment thread cmd/entire/cli/review/run.go Outdated
Comment thread cmd/entire/cli/review/run_multi.go Outdated
peyton-alt and others added 3 commits May 12, 2026 00:18
A panic in EnrichAgentRun previously crashed the per-agent forwarding
goroutine, leaking goroutines and aborting the run after agents had
already done their work. callEnrichAgentRun now wraps the call with
defer/recover; on panic, no synthetic Tokens event is emitted and the
run continues to RunFinished.

Documents the EnrichSummary/EnrichAgentRun contract on RunConfig so
future implementers can find: which goroutine, when called, what fields
are actually consumed (Tokens-only for EnrichAgentRun), and the
must-not-block / must-not-panic rules. Without these docs, callers
hit footguns: setting Err on the AgentRun and wondering why it's
silently dropped, or assuming EnrichSummary has the same panic safety
as EnrichAgentRun (it does not — caller responsibility, by design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 76636eb4103a
Two operational-visibility fixes for the silent-failure paths around
the new failure-surfacing work in this PR:

1. reviewTokenUsageForSession returned nil on three distinct error
   paths (agent registry lookup, transcript read) with no log. A user
   seeing a blank TOKENS column had no way to triage. Adds
   logging.Debug at each silent-nil branch with session id and the
   underlying error — keeps the user-visible behavior identical
   (still nil) but makes the cause findable with ENTIRE_LOG_LEVEL=debug.

2. formatErrorPreview picked the "first non-empty stderr line" before
   stripping ANSI. Agents like codex and claude-code emit colored
   stderr banners whose first line is escape codes only; TrimSpace
   doesn't drop those, so we'd surface the chrome ("\x1b[31m\x1b[0m")
   and hide the actual error message on subsequent lines. Strips ANSI
   per-line before the empty check using the existing review-package
   stripANSI helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1bdcc2e089e2
@peyton-alt peyton-alt changed the title feat(review): surface agent failures in live dashboard and post-run dump feat(review): failures and live multi-agent progress more reliable May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants