feat(review): failures and live multi-agent progress more reliable#1167
Open
peyton-alt wants to merge 8 commits into
Open
feat(review): failures and live multi-agent progress more reliable#1167peyton-alt wants to merge 8 commits into
peyton-alt wants to merge 8 commits into
Conversation
Three coordinated changes that make agent failures legible to the user
instead of silently displayed as "✓ done" or buried inside an inline-code
span:
1. run.go and run_multi.go: when proc.Wait() returns a non-nil error,
emit a synthetic RunError event into the live sink stream. Previously
the wait error only reached sinks via RunFinished — which fires once
at the end of all agents. In a multi-agent run, an agent that exits
non-zero in the first second would render as "✓ done" (from the
parser's typical clean-EOF Finished{Success:true}) for the entire
duration of any other still-running agent. The synthetic event flips
the row to "✗ failed" within milliseconds of the failure, while
other agents continue normally.
2. tui_model.go: when an agent has failed and row.err is set, render
the error in the dashboard PREVIEW column instead of leaving it
blank. The new formatErrorPreview helper strips wrapper noise:
for *ProcessError it returns the first non-empty stderr line
(the agent CLI's own headline message), since the agent name is
already shown in column 1 and "✗ failed" in column 2 — no need
to repeat them. For other error types it returns err.Error()
verbatim, since they don't have wrappers worth stripping.
3. dump.go: when run.Err is a *ProcessError carrying captured stderr,
render the failure block as a header line ("**Failed:** `agent`
exited (`exit status N`). Stderr:") followed by the full stderr
in a fenced code block on its own. Multi-line stderr (auth errors,
stack traces, retry hints) reads cleanly instead of getting
collapsed inside an inline-code span. Generic errors keep the
inline rendering. Synthetic RunError events whose Err matches
run.Err are skipped from the blockquote loop to avoid double-
printing the same error in adjacent output blocks.
Captures any non-zero process exit, not just one error class: auth
failures, network timeouts, rate limits, license errors, segfaults,
permission errors, OOM kills, etc. — anything where the agent CLI
exits non-zero gets the same treatment.
Limitations (worth follow-up tickets):
- Live preview is still subject to terminal-width truncation by
truncateDisplayWidth (with `…` affordance). Full text always
available in the post-run dump.
- Ctrl+O drill-in still renders each event as a single truncated line
via eventLine — the new error preview is more readable in the main
dashboard but the detail view limitation is unchanged. Separate
ticket for proper viewport scrolling/wrapping in tui_detail.go.
- "Silent success" (agent exits 0 with empty narrative) is not
caught by this change — the trigger is non-zero exit. Separate
detection logic would be needed to flag empty-output successes.
Tests added (TDD, every test failed before implementation):
- TestDumpSink_FailedAgentWithProcessErrorRendersStderrAsCodeFence
- TestDumpSink_DoesNotDoublePrintSyntheticRunErrorMatchingRunErr
- TestTUIModel_DashboardShowsErrorPreviewForFailedAgent
- TestTUIModel_DashboardErrorPreviewYieldsToAssistantTextBeforeFailure
- TestTUIModel_DashboardErrorPreviewStripsProcessErrorWrapper
- TestTUIModel_DashboardErrorPreviewFallsBackToErrStringForNonProcessError
- TestRun_EmitsSyntheticRunErrorWhenWaitErrIsNonNil
- TestRun_DoesNotEmitSyntheticRunErrorOnCleanExit
- TestRunMulti_EmitsSyntheticRunErrorWhenAgentWaitErrIsNonNil
One existing test updated to reflect the new event-count contract:
- TestRunMulti_OneSucceedsOneFails: failing agent now produces one
extra event (the synthetic RunError) compared to before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8ffa23cd3901
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves how agent failures are surfaced during entire review runs by emitting failure events into the live stream (so the TUI updates immediately) and by rendering clearer failure previews and post-run dumps, especially for multi-line stderr.
Changes:
- Emit a synthetic
RunErrorevent whenproc.Wait()returns an error (single- and multi-agent orchestrators) so live sinks (TUI/dump) see failures immediately. - Show a failure’s error message in the TUI dashboard PREVIEW column (with special formatting for
*ProcessErrorto prefer the first stderr line). - Render
*ProcessErrorstderr as a fenced code block in the post-run dump, and avoid double-printing the syntheticRunError.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/review/tui_model.go | Shows a concise error preview in the dashboard for failed rows; adds formatErrorPreview. |
| cmd/entire/cli/review/tui_model_test.go | Adds tests covering dashboard error preview behavior and formatting. |
| cmd/entire/cli/review/run.go | Emits a synthetic RunError after Wait() to surface process-exit failures in the live stream. |
| cmd/entire/cli/review/run_test.go | Adds tests asserting synthetic RunError emission on Wait() error and absence on clean exit. |
| cmd/entire/cli/review/run_multi.go | Emits a synthetic RunError per agent after Wait() in multi-agent runs. |
| cmd/entire/cli/review/run_multi_test.go | Updates event-count expectations and adds a test for multi-agent synthetic RunError emission. |
| cmd/entire/cli/review/dump.go | Improves failed-run dump formatting (fenced stderr) and skips duplicate synthetic errors. |
| cmd/entire/cli/review/dump_test.go | Adds tests for fenced stderr rendering and duplicate suppression. |
Entire-Checkpoint: 103a54950db4
4 tasks
A panic in EnrichAgentRun previously crashed the per-agent forwarding goroutine, leaking goroutines and aborting the run after agents had already done their work. callEnrichAgentRun now wraps the call with defer/recover; on panic, no synthetic Tokens event is emitted and the run continues to RunFinished. Documents the EnrichSummary/EnrichAgentRun contract on RunConfig so future implementers can find: which goroutine, when called, what fields are actually consumed (Tokens-only for EnrichAgentRun), and the must-not-block / must-not-panic rules. Without these docs, callers hit footguns: setting Err on the AgentRun and wondering why it's silently dropped, or assuming EnrichSummary has the same panic safety as EnrichAgentRun (it does not — caller responsibility, by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 76636eb4103a
Two operational-visibility fixes for the silent-failure paths around
the new failure-surfacing work in this PR:
1. reviewTokenUsageForSession returned nil on three distinct error
paths (agent registry lookup, transcript read) with no log. A user
seeing a blank TOKENS column had no way to triage. Adds
logging.Debug at each silent-nil branch with session id and the
underlying error — keeps the user-visible behavior identical
(still nil) but makes the cause findable with ENTIRE_LOG_LEVEL=debug.
2. formatErrorPreview picked the "first non-empty stderr line" before
stripping ANSI. Agents like codex and claude-code emit colored
stderr banners whose first line is escape codes only; TrimSpace
doesn't drop those, so we'd surface the chrome ("\x1b[31m\x1b[0m")
and hide the actual error message on subsequent lines. Strips ANSI
per-line before the empty check using the existing review-package
stripANSI helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1bdcc2e089e2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://entire.io/gh/entireio/cli/trails/341
Summary
This PR makes
entire reviewfailures and live multi-agent progress more reliable and legible:run.goandrun_multi.goemit a syntheticRunErrorwhenproc.Wait()returns a real process failure, so the dashboard updates immediately instead of waiting for finalRunFinished.RunErroremission is skipped when the context is cancelled or the wait error is cancellation/deadline related, so Ctrl+C does not leave rows marked failed.*ProcessErrorand generic error text otherwise.RunErroris not double-printed in the dump.Background
User-reported issues:
✓ doneor✗ failedwith an empty PREVIEW while other agents continued running.This change keeps failure information visible during the run, moves the live dashboard back to alt screen to avoid scrollback reflow corruption, and hydrates token totals from the persisted review session state when the agent stream itself did not emit tokens.
Coverage
Failure handling now covers non-zero process exits without misclassifying cancellation:
✗ failedwith stderr headline in PREVIEW✗ failedwith error preview✗ failedwith error previewRunErrorWhat this PR does NOT do
Test plan
go test ./cmd/entire/cli/review -run 'Test(DumpSink|TUIModel|Run_|RunMulti|HydrateReview|ReviewSummaryTokenEnricher|WarnManifest|WritePostReviewManifest|BuildLocalReviewManifest)' -count=1go test ./cmd/entire/cli/review ./cmd/entire/cli/agent/codex ./cmd/entire/cli/agent/claudecode ./cmd/entire/cli/agent/geminicli -run 'Test(DumpSink|TUIModel|Run_|RunMulti|HydrateReview|ReviewSummaryTokenEnricher|WarnManifest|WritePostReviewManifest|BuildLocalReviewManifest|Codex|Claude|Gemini|Discover|Token|Parser|Parse)' -count=1mise run lintAGENT STATUS...fragments in normal scrollback.Tests added / updated
ProcessErrorstderr.RunErrortests for single and multi-agent process failures.Note
Medium Risk
Changes review orchestrator event emission, live TUI rendering behavior, and token enrichment. The behavior is localized to
entire review, with focused coverage for failure, cancellation, width, alt-screen, and token hydration paths.