Skip to content

Stream review agent events live via JSONL output modes#1192

Open
peyton-alt wants to merge 10 commits into
mainfrom
review-fix-progress-stream
Open

Stream review agent events live via JSONL output modes#1192
peyton-alt wants to merge 10 commits into
mainfrom
review-fix-progress-stream

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented May 12, 2026

https://entire.io/gh/entireio/cli/trails/363

Summary

  • The entire review multi-agent TUI's Ctrl+O drill-in stayed at [started] and the dashboard preview column stayed empty for the entire run, because both claude-code (claude -p) and codex (codex exec -) buffer stdout under their default plain-text modes — events only arrived after the agent exited.
  • This PR switches each agent's parser to that agent's native JSONL streaming format: claude uses --output-format stream-json --verbose, codex uses exec --json. Each stdout line is one JSON envelope; the parsers dispatch one Event per envelope (assistant text, tool use, tokens, finished).
  • Net: ~290 lines added, ~930 lines deleted (codex's 580-line chrome-filter state machine is gone). No PTY, no AgentReviewer interface changes, no TUI changes. The "extend parsers in place" boundary from the brief is preserved.

Test Plan

  • mise run check passes (5158 unit tests + 340 integration tests + canary E2E)
  • Unit-test parsers against captured JSONL fixtures (testdata/stream_session.jsonl, testdata/json_session.jsonl)
  • TestParseCodexOutput_StreamsEventsBeforeEOF guards the actual streaming property (drips NDJSON through io.Pipe, asserts events arrive before the pipe closes)
  • Real entire review --agent claude-code: drill-in shows tool calls + assistant text streaming during the run
  • Real entire review --agent codex: drill-in shows [tool: exec] rows + assistant text streaming during the run (codex has a ~30s startup before first event — model warmup, not a parser issue)
  • Real multi-agent entire review: both agents stream side-by-side; dashboard preview column updates mid-run for both

Notes

  • gemini-cli is not changed in this PR — its parser stays on plain text. If gemini's behavior shows the same buffering, a follow-up can switch it using the same pattern.
  • PR feat(review): improve drill-in scrolling and post-run access #1184 (drill-in viewport, in-flight) composes cleanly: long ToolCall args truncate with via truncateDisplayWidth; each event is exactly one row in the body.
  • Two follow-ups deferred from final code review: (a) add a TestParseClaudeOutput_StreamsEventsBeforeEOF for symmetry with codex; (b) per-tool argument summarization in eventLine (e.g., extract file_path for Read instead of dumping the full JSON input).

🤖 Generated with Claude Code


Note

Medium Risk
Switches the claude-code and codex review runners to new JSONL output modes and rewrites their parsers, which could affect live event streaming, tool-call rendering, and success/tokens detection across agent versions.

Overview
entire review now streams live events for claude-code and codex by switching both to their native JSONL stdout modes (claude --output-format stream-json --verbose, codex exec --json) and parsing each envelope into Events (AssistantText, ToolCall, Tokens, Finished).

Codex’s prior stdout “chrome” filtering/state machine is removed entirely, replaced with direct JSON envelope decoding; both parsers now treat missing terminal envelopes (result / turn.completed) as Finished{Success:false} and tolerate garbled lines by emitting RunError while continuing.

Tests and fixtures are updated/added to assert argv shapes, JSONL decoding, and true streaming before EOF via io.Pipe; integration/smoke stubs now emit minimal result JSON to satisfy the new Claude parser, and claudecode/transcript.go shares the assistant envelope type constant.

Reviewed by Cursor Bugbot for commit 718f92e. Configure here.

peyton-alt and others added 6 commits May 12, 2026 11:36
claude -p plain-text mode buffers stdout until the model has finished
generating, so the review drill-in (Ctrl+O) stayed at [started] for the
whole run and the dashboard preview only populated at end. Switching to
stream-json --verbose emits one JSON envelope per agent event (assistant
messages, tool use, result), giving the parser per-message granularity
and making Ctrl+O surface live progress.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: e3d911c520d2
codex exec plain-text mode buffers chrome + assistant output until the
model finishes, so the review drill-in stayed at [started] for codex
agents. Switching to exec --json emits one JSON envelope per agent-side
event (item.started for tool calls, item.completed for agent messages,
turn.completed for the terminal usage block), giving the parser
per-event granularity. The old chrome-filter state machine is no longer
needed and has been removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: e2625d3943af
Address code-review feedback on the codex --json parser:

- Match claudecode's post-loop Tokens+Finished emission pattern instead
  of emitting inline inside the turn.completed branch. Prevents double
  emit if codex ever sends multiple turn.completed envelopes and keeps
  the two agent parsers structurally parallel.
- Re-add TestParseCodexOutput_StreamsEventsBeforeEOF (deleted in the
  chunk 2 rewrite) using NDJSON envelopes through io.Pipe. Guards
  against future regressions that re-introduce batching.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: 6118267e4e52
The codex chrome filter (output_filter.go) was removed in
f0b6eca / a3d29fe when codex switched to exec --json. Two CLAUDE.md
prose lines still mentioned it: the reviewer.go inventory said
"codex with chrome filter", and the anti-features list still referred
to filterCodexOutput specifically. Tidy both to reflect the post-switch
shape, with the anti-feature now stated as a general principle
(per-agent parsers own their format; shared code only sees Event
variants) so the design boundary is still documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: fb4673c45488
TestReviewCommand_PassesReviewEnvToSpawnedAgentHook uses a fake claude
shell script whose stdout was empty (the script pipes its content to a
hook's stdin). With chunk 1's switch to --output-format stream-json the
new parser interpreted the empty stream as "no result envelope" and
classified the agent run as Failed, breaking the test.

Append a minimal valid stream-json result envelope to the fake script's
stdout so the parser sees a clean session end. The hook-firing path the
test actually exercises is unchanged.

Apply the same fix to TestReviewCommandSmoke_IncludesCheckpointContextInPrompt
whose stub printed plain text 'smoke review ok' that the new parser
treated as malformed JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: 6066c1f3cd20
The comments referred to Strip's internal scanner, which lived in
output_filter.go and was deleted when codex switched to exec --json.
parseCodexOutput now reads io.Reader directly with its own 16MB scanner
buffer; the comments now reflect that.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: 7df8d0da6d9e
Copilot AI review requested due to automatic review settings May 12, 2026 17:17
@peyton-alt peyton-alt requested a review from a team as a code owner May 12, 2026 17:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes live event streaming in entire review by switching claude-code and codex adapters from buffered plain-text parsing to each agent’s native JSONL streaming output, so the TUI drill-in and preview column update during the run instead of only after process exit.

Changes:

  • Update claude-code reviewer to run claude -p ... --output-format stream-json --verbose and parse JSONL envelopes into Started/AssistantText/ToolCall/Tokens/Finished events.
  • Update codex reviewer to run codex exec --skip-git-repo-check --json - and replace the prior “chrome filter + state machine” with direct JSON envelope decoding.
  • Refresh fixtures/tests and integration stubs to emit JSONL “result” envelopes; remove obsolete codex chrome-filter code and fixtures; update docs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/entire/cli/review_context_test.go Updates smoke stub output to emit a JSONL result envelope compatible with stream-json parsing.
cmd/entire/cli/integration_test/review_test.go Updates fake agent to emit a JSONL result envelope so entire review can complete under the new parser.
cmd/entire/cli/agent/codex/reviewer.go Switches codex argv to --json and replaces text/chrome parsing with JSONL envelope decoding into events.
cmd/entire/cli/agent/codex/reviewer_test.go Updates tests to use JSONL fixtures and adds a streaming-before-EOF regression test.
cmd/entire/cli/agent/codex/testdata/json_session.jsonl Adds captured codex JSONL fixture for parser contract tests.
cmd/entire/cli/agent/codex/testdata/canned_exec.txt Removes obsolete plain-text codex fixture (no longer used).
cmd/entire/cli/agent/codex/output_filter.go Removes codex chrome filtering implementation (superseded by JSONL).
cmd/entire/cli/agent/codex/output_filter_test.go Removes tests for deleted chrome filter.
cmd/entire/cli/agent/claudecode/reviewer.go Switches claude argv to stream-json + verbose and parses JSONL envelopes into events (including tool_use → ToolCall).
cmd/entire/cli/agent/claudecode/reviewer_test.go Updates tests to validate stream-json fixture decoding and argv shape.
cmd/entire/cli/agent/claudecode/testdata/stream_session.jsonl Adds captured claude stream-json fixture for parser contract tests.
cmd/entire/cli/agent/claudecode/testdata/canned_session.txt Removes obsolete plain-text claude fixture (no longer used).
cmd/entire/cli/agent/claudecode/transcript.go Reuses a shared assistant envelope type constant.
CLAUDE.md Updates documentation to reflect per-agent parsing ownership and removal of shared codex filtering concerns.

Comment thread cmd/entire/cli/agent/codex/reviewer_test.go Outdated
Comment thread cmd/entire/cli/agent/claudecode/reviewer_test.go
Comment thread cmd/entire/cli/agent/claudecode/reviewer.go Outdated
peyton-alt and others added 2 commits May 12, 2026 13:26
Address Tier A items from PR #1192 review:

- Add TestParseClaudeOutput_StreamsEventsBeforeEOF mirroring codex's
  io.Pipe + timeout-gated streaming guard. Also exercises the
  previously-uncovered tool_use content block path.
- Add TestParse{Claude,Codex}Output_NoTerminalEnvelopeMeansFailed for
  the truncated-stream → Finished{Success:false} branch.
- Add TestParse{Claude,Codex}Output_GarbledLineEmitsRunErrorAndContinues
  to lock the recover-and-continue contract for per-line JSON errors.
- Document codex's hard-coded Finished{Success:true} so a future
  schema addition that exposes turn-level errors gets wired through.
- Soften the "codex 0.130.0" version-pin comment to read as a contract.
- Replace the stale "no chrome filtering needed" comment in
  geminicli/reviewer.go with a description that doesn't reference
  deleted code.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: 4ab5685eff94
- Reuse messageUsage (claudecode/types.go) for the stream-json result
  envelope's Usage field instead of a duplicate claudeUsage struct, so
  the two consumers of Claude's API usage shape stay in lockstep
  (bugbot feedback on PR #1192).
- Streaming-before-EOF tests for both agents: capture the expected event
  into a separate variable before type-asserting, so a failure message
  reflects the actual event type rather than the zero-valued asserted
  type (Copilot feedback on PR #1192).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: f516fe93f468
@peyton-alt
Copy link
Copy Markdown
Contributor Author

@BugBot review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 718f92e. Configure here.

peyton-alt and others added 2 commits May 12, 2026 22:35
Three follow-ups from review of this PR:

1. Bump the bufio.Scanner max line length from 16MB to 64MB in both
   parsers. Codex packs the entire stdout of command_execution tools
   into the aggregated_output field on item.completed envelopes
   inline, so a chatty grep/cat/find over a moderately-sized repo can
   put many MB into one envelope and hit the prior cap — surfacing as
   "review failed" with no clue what tipped it over. Claude shares
   the cap for parity; one buffer per active review, so memory cost
   is modest. Comment on both sites explains the choice.

2. Default arm on codex's envelope-type switch logs unknown types at
   Debug. The parser today handles thread.started, turn.started,
   item.started, item.completed, turn.completed; anything else falls
   through silently. If codex evolves (new tool item types,
   intermediate envelope variants), drift is now triageable via
   ENTIRE_LOG_LEVEL=debug instead of needing source-dive. Adds an
   explicit no-op case for thread.started / turn.started so they
   remain documented swallows rather than absorbed by the default.

3. Doc-comment polish on both parser functions:
   - "Exposed for golden-file contract testing" → "Package-private;
     called directly from this package's tests" (the parsers are
     lowercase / unexported, "Exposed" misread on first glance).
   - Add a line noting Tokens are emitted only at the terminal
     envelope, not incrementally — the PR's headline "live events"
     benefit applies to AssistantText/ToolCall, not Tokens, and
     readers expecting per-message token streaming would be
     surprised.

Note on review feedback #2 (missing codex garbled-line test): turned
out to be incorrect — TestParseCodexOutput_GarbledLineEmitsRunError
AndContinues already exists at codex/reviewer_test.go:346. Symmetry
is intact; no test added.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: 3dd5f29e5289
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants