Stream review agent events live via JSONL output modes by peyton-alt · Pull Request #1192 · entireio/cli

peyton-alt · 2026-05-12T17:17:46Z

https://entire.io/gh/entireio/cli/trails/363

Summary

The entire review multi-agent TUI's Ctrl+O drill-in stayed at [started] and the dashboard preview column stayed empty for the entire run, because both claude-code (claude -p) and codex (codex exec -) buffer stdout under their default plain-text modes — events only arrived after the agent exited.
This PR switches each agent's parser to that agent's native JSONL streaming format: claude uses --output-format stream-json --verbose, codex uses exec --json. Each stdout line is one JSON envelope; the parsers dispatch one Event per envelope (assistant text, tool use, tokens, finished).
Net: ~290 lines added, ~930 lines deleted (codex's 580-line chrome-filter state machine is gone). No PTY, no AgentReviewer interface changes, no TUI changes. The "extend parsers in place" boundary from the brief is preserved.

Test Plan

mise run check passes (5158 unit tests + 340 integration tests + canary E2E)
Unit-test parsers against captured JSONL fixtures (testdata/stream_session.jsonl, testdata/json_session.jsonl)
TestParseCodexOutput_StreamsEventsBeforeEOF guards the actual streaming property (drips NDJSON through io.Pipe, asserts events arrive before the pipe closes)
Real entire review --agent claude-code: drill-in shows tool calls + assistant text streaming during the run
Real entire review --agent codex: drill-in shows [tool: exec] rows + assistant text streaming during the run (codex has a ~30s startup before first event — model warmup, not a parser issue)
Real multi-agent entire review: both agents stream side-by-side; dashboard preview column updates mid-run for both

Notes

gemini-cli is not changed in this PR — its parser stays on plain text. If gemini's behavior shows the same buffering, a follow-up can switch it using the same pattern.
PR feat(review): improve drill-in scrolling and post-run access #1184 (drill-in viewport, in-flight) composes cleanly: long ToolCall args truncate with … via truncateDisplayWidth; each event is exactly one row in the body.
Two follow-ups deferred from final code review: (a) add a TestParseClaudeOutput_StreamsEventsBeforeEOF for symmetry with codex; (b) per-tool argument summarization in eventLine (e.g., extract file_path for Read instead of dumping the full JSON input).

🤖 Generated with Claude Code

Note

Medium Risk
Switches the claude-code and codex review runners to new JSONL output modes and rewrites their parsers, which could affect live event streaming, tool-call rendering, and success/tokens detection across agent versions.

Overview
entire review now streams live events for claude-code and codex by switching both to their native JSONL stdout modes (claude --output-format stream-json --verbose, codex exec --json) and parsing each envelope into Events (AssistantText, ToolCall, Tokens, Finished).

Codex’s prior stdout “chrome” filtering/state machine is removed entirely, replaced with direct JSON envelope decoding; both parsers now treat missing terminal envelopes (result / turn.completed) as Finished{Success:false} and tolerate garbled lines by emitting RunError while continuing.

Tests and fixtures are updated/added to assert argv shapes, JSONL decoding, and true streaming before EOF via io.Pipe; integration/smoke stubs now emit minimal result JSON to satisfy the new Claude parser, and claudecode/transcript.go shares the assistant envelope type constant.

^{Reviewed by Cursor Bugbot for commit 718f92e. Configure here.}

claude -p plain-text mode buffers stdout until the model has finished generating, so the review drill-in (Ctrl+O) stayed at [started] for the whole run and the dashboard preview only populated at end. Switching to stream-json --verbose emits one JSON envelope per agent event (assistant messages, tool use, result), giving the parser per-message granularity and making Ctrl+O surface live progress. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: e3d911c520d2

codex exec plain-text mode buffers chrome + assistant output until the model finishes, so the review drill-in stayed at [started] for codex agents. Switching to exec --json emits one JSON envelope per agent-side event (item.started for tool calls, item.completed for agent messages, turn.completed for the terminal usage block), giving the parser per-event granularity. The old chrome-filter state machine is no longer needed and has been removed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: e2625d3943af

Address code-review feedback on the codex --json parser: - Match claudecode's post-loop Tokens+Finished emission pattern instead of emitting inline inside the turn.completed branch. Prevents double emit if codex ever sends multiple turn.completed envelopes and keeps the two agent parsers structurally parallel. - Re-add TestParseCodexOutput_StreamsEventsBeforeEOF (deleted in the chunk 2 rewrite) using NDJSON envelopes through io.Pipe. Guards against future regressions that re-introduce batching. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 6118267e4e52

The codex chrome filter (output_filter.go) was removed in f0b6eca / a3d29fe when codex switched to exec --json. Two CLAUDE.md prose lines still mentioned it: the reviewer.go inventory said "codex with chrome filter", and the anti-features list still referred to filterCodexOutput specifically. Tidy both to reflect the post-switch shape, with the anti-feature now stated as a general principle (per-agent parsers own their format; shared code only sees Event variants) so the design boundary is still documented. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: fb4673c45488

TestReviewCommand_PassesReviewEnvToSpawnedAgentHook uses a fake claude shell script whose stdout was empty (the script pipes its content to a hook's stdin). With chunk 1's switch to --output-format stream-json the new parser interpreted the empty stream as "no result envelope" and classified the agent run as Failed, breaking the test. Append a minimal valid stream-json result envelope to the fake script's stdout so the parser sees a clean session end. The hook-firing path the test actually exercises is unchanged. Apply the same fix to TestReviewCommandSmoke_IncludesCheckpointContextInPrompt whose stub printed plain text 'smoke review ok' that the new parser treated as malformed JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 6066c1f3cd20

The comments referred to Strip's internal scanner, which lived in output_filter.go and was deleted when codex switched to exec --json. parseCodexOutput now reads io.Reader directly with its own 16MB scanner buffer; the comments now reflect that. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 7df8d0da6d9e

Copilot

Pull request overview

This PR fixes live event streaming in entire review by switching claude-code and codex adapters from buffered plain-text parsing to each agent’s native JSONL streaming output, so the TUI drill-in and preview column update during the run instead of only after process exit.

Changes:

Update claude-code reviewer to run claude -p ... --output-format stream-json --verbose and parse JSONL envelopes into Started/AssistantText/ToolCall/Tokens/Finished events.
Update codex reviewer to run codex exec --skip-git-repo-check --json - and replace the prior “chrome filter + state machine” with direct JSON envelope decoding.
Refresh fixtures/tests and integration stubs to emit JSONL “result” envelopes; remove obsolete codex chrome-filter code and fixtures; update docs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
cmd/entire/cli/review_context_test.go	Updates smoke stub output to emit a JSONL `result` envelope compatible with stream-json parsing.
cmd/entire/cli/integration_test/review_test.go	Updates fake agent to emit a JSONL `result` envelope so `entire review` can complete under the new parser.
cmd/entire/cli/agent/codex/reviewer.go	Switches codex argv to `--json` and replaces text/chrome parsing with JSONL envelope decoding into events.
cmd/entire/cli/agent/codex/reviewer_test.go	Updates tests to use JSONL fixtures and adds a streaming-before-EOF regression test.
cmd/entire/cli/agent/codex/testdata/json_session.jsonl	Adds captured codex JSONL fixture for parser contract tests.
cmd/entire/cli/agent/codex/testdata/canned_exec.txt	Removes obsolete plain-text codex fixture (no longer used).
cmd/entire/cli/agent/codex/output_filter.go	Removes codex chrome filtering implementation (superseded by JSONL).
cmd/entire/cli/agent/codex/output_filter_test.go	Removes tests for deleted chrome filter.
cmd/entire/cli/agent/claudecode/reviewer.go	Switches claude argv to stream-json + verbose and parses JSONL envelopes into events (including tool_use → ToolCall).
cmd/entire/cli/agent/claudecode/reviewer_test.go	Updates tests to validate stream-json fixture decoding and argv shape.
cmd/entire/cli/agent/claudecode/testdata/stream_session.jsonl	Adds captured claude stream-json fixture for parser contract tests.
cmd/entire/cli/agent/claudecode/testdata/canned_session.txt	Removes obsolete plain-text claude fixture (no longer used).
cmd/entire/cli/agent/claudecode/transcript.go	Reuses a shared `assistant` envelope type constant.
CLAUDE.md	Updates documentation to reflect per-agent parsing ownership and removal of shared codex filtering concerns.

Address Tier A items from PR #1192 review: - Add TestParseClaudeOutput_StreamsEventsBeforeEOF mirroring codex's io.Pipe + timeout-gated streaming guard. Also exercises the previously-uncovered tool_use content block path. - Add TestParse{Claude,Codex}Output_NoTerminalEnvelopeMeansFailed for the truncated-stream → Finished{Success:false} branch. - Add TestParse{Claude,Codex}Output_GarbledLineEmitsRunErrorAndContinues to lock the recover-and-continue contract for per-line JSON errors. - Document codex's hard-coded Finished{Success:true} so a future schema addition that exposes turn-level errors gets wired through. - Soften the "codex 0.130.0" version-pin comment to read as a contract. - Replace the stale "no chrome filtering needed" comment in geminicli/reviewer.go with a description that doesn't reference deleted code. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 4ab5685eff94

- Reuse messageUsage (claudecode/types.go) for the stream-json result envelope's Usage field instead of a duplicate claudeUsage struct, so the two consumers of Claude's API usage shape stay in lockstep (bugbot feedback on PR #1192). - Streaming-before-EOF tests for both agents: capture the expected event into a separate variable before type-asserting, so a failure message reflects the actual event type rather than the zero-valued asserted type (Copilot feedback on PR #1192). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: f516fe93f468

peyton-alt · 2026-05-12T17:33:21Z

@BugBot review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 718f92e. Configure here.}

Three follow-ups from review of this PR: 1. Bump the bufio.Scanner max line length from 16MB to 64MB in both parsers. Codex packs the entire stdout of command_execution tools into the aggregated_output field on item.completed envelopes inline, so a chatty grep/cat/find over a moderately-sized repo can put many MB into one envelope and hit the prior cap — surfacing as "review failed" with no clue what tipped it over. Claude shares the cap for parity; one buffer per active review, so memory cost is modest. Comment on both sites explains the choice. 2. Default arm on codex's envelope-type switch logs unknown types at Debug. The parser today handles thread.started, turn.started, item.started, item.completed, turn.completed; anything else falls through silently. If codex evolves (new tool item types, intermediate envelope variants), drift is now triageable via ENTIRE_LOG_LEVEL=debug instead of needing source-dive. Adds an explicit no-op case for thread.started / turn.started so they remain documented swallows rather than absorbed by the default. 3. Doc-comment polish on both parser functions: - "Exposed for golden-file contract testing" → "Package-private; called directly from this package's tests" (the parsers are lowercase / unexported, "Exposed" misread on first glance). - Add a line noting Tokens are emitted only at the terminal envelope, not incrementally — the PR's headline "live events" benefit applies to AssistantText/ToolCall, not Tokens, and readers expecting per-message token streaming would be surprised. Note on review feedback #2 (missing codex garbled-line test): turned out to be incorrect — TestParseCodexOutput_GarbledLineEmitsRunError AndContinues already exists at codex/reviewer_test.go:346. Symmetry is intact; no test added. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 3dd5f29e5289

…tream

peyton-alt and others added 6 commits May 12, 2026 11:36

Copilot AI review requested due to automatic review settings May 12, 2026 17:17

peyton-alt requested a review from a team as a code owner May 12, 2026 17:17

Copilot started reviewing on behalf of peyton-alt May 12, 2026 17:18 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread cmd/entire/cli/agent/codex/reviewer_test.go Outdated

Comment thread cmd/entire/cli/agent/claudecode/reviewer_test.go

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread cmd/entire/cli/agent/claudecode/reviewer.go Outdated

peyton-alt and others added 2 commits May 12, 2026 13:26

cursor Bot reviewed May 12, 2026

View reviewed changes

peyton-alt and others added 2 commits May 12, 2026 22:35

Merge remote-tracking branch 'origin/main' into review-fix-progress-s…

6f3850f

…tream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream review agent events live via JSONL output modes#1192

Stream review agent events live via JSONL output modes#1192
peyton-alt wants to merge 10 commits into
mainfrom
review-fix-progress-stream

peyton-alt commented May 12, 2026 •

edited by cursor Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peyton-alt commented May 12, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

peyton-alt commented May 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peyton-alt commented May 12, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

peyton-alt commented May 12, 2026 •

edited by cursor Bot

Loading