Stream review agent events live via JSONL output modes#1192
Open
peyton-alt wants to merge 10 commits into
Open
Stream review agent events live via JSONL output modes#1192peyton-alt wants to merge 10 commits into
peyton-alt wants to merge 10 commits into
Conversation
claude -p plain-text mode buffers stdout until the model has finished generating, so the review drill-in (Ctrl+O) stayed at [started] for the whole run and the dashboard preview only populated at end. Switching to stream-json --verbose emits one JSON envelope per agent event (assistant messages, tool use, result), giving the parser per-message granularity and making Ctrl+O surface live progress. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: e3d911c520d2
codex exec plain-text mode buffers chrome + assistant output until the model finishes, so the review drill-in stayed at [started] for codex agents. Switching to exec --json emits one JSON envelope per agent-side event (item.started for tool calls, item.completed for agent messages, turn.completed for the terminal usage block), giving the parser per-event granularity. The old chrome-filter state machine is no longer needed and has been removed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: e2625d3943af
Address code-review feedback on the codex --json parser: - Match claudecode's post-loop Tokens+Finished emission pattern instead of emitting inline inside the turn.completed branch. Prevents double emit if codex ever sends multiple turn.completed envelopes and keeps the two agent parsers structurally parallel. - Re-add TestParseCodexOutput_StreamsEventsBeforeEOF (deleted in the chunk 2 rewrite) using NDJSON envelopes through io.Pipe. Guards against future regressions that re-introduce batching. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 6118267e4e52
The codex chrome filter (output_filter.go) was removed in f0b6eca / a3d29fe when codex switched to exec --json. Two CLAUDE.md prose lines still mentioned it: the reviewer.go inventory said "codex with chrome filter", and the anti-features list still referred to filterCodexOutput specifically. Tidy both to reflect the post-switch shape, with the anti-feature now stated as a general principle (per-agent parsers own their format; shared code only sees Event variants) so the design boundary is still documented. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: fb4673c45488
TestReviewCommand_PassesReviewEnvToSpawnedAgentHook uses a fake claude shell script whose stdout was empty (the script pipes its content to a hook's stdin). With chunk 1's switch to --output-format stream-json the new parser interpreted the empty stream as "no result envelope" and classified the agent run as Failed, breaking the test. Append a minimal valid stream-json result envelope to the fake script's stdout so the parser sees a clean session end. The hook-firing path the test actually exercises is unchanged. Apply the same fix to TestReviewCommandSmoke_IncludesCheckpointContextInPrompt whose stub printed plain text 'smoke review ok' that the new parser treated as malformed JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 6066c1f3cd20
The comments referred to Strip's internal scanner, which lived in output_filter.go and was deleted when codex switched to exec --json. parseCodexOutput now reads io.Reader directly with its own 16MB scanner buffer; the comments now reflect that. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 7df8d0da6d9e
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes live event streaming in entire review by switching claude-code and codex adapters from buffered plain-text parsing to each agent’s native JSONL streaming output, so the TUI drill-in and preview column update during the run instead of only after process exit.
Changes:
- Update claude-code reviewer to run
claude -p ... --output-format stream-json --verboseand parse JSONL envelopes intoStarted/AssistantText/ToolCall/Tokens/Finishedevents. - Update codex reviewer to run
codex exec --skip-git-repo-check --json -and replace the prior “chrome filter + state machine” with direct JSON envelope decoding. - Refresh fixtures/tests and integration stubs to emit JSONL “result” envelopes; remove obsolete codex chrome-filter code and fixtures; update docs.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/review_context_test.go | Updates smoke stub output to emit a JSONL result envelope compatible with stream-json parsing. |
| cmd/entire/cli/integration_test/review_test.go | Updates fake agent to emit a JSONL result envelope so entire review can complete under the new parser. |
| cmd/entire/cli/agent/codex/reviewer.go | Switches codex argv to --json and replaces text/chrome parsing with JSONL envelope decoding into events. |
| cmd/entire/cli/agent/codex/reviewer_test.go | Updates tests to use JSONL fixtures and adds a streaming-before-EOF regression test. |
| cmd/entire/cli/agent/codex/testdata/json_session.jsonl | Adds captured codex JSONL fixture for parser contract tests. |
| cmd/entire/cli/agent/codex/testdata/canned_exec.txt | Removes obsolete plain-text codex fixture (no longer used). |
| cmd/entire/cli/agent/codex/output_filter.go | Removes codex chrome filtering implementation (superseded by JSONL). |
| cmd/entire/cli/agent/codex/output_filter_test.go | Removes tests for deleted chrome filter. |
| cmd/entire/cli/agent/claudecode/reviewer.go | Switches claude argv to stream-json + verbose and parses JSONL envelopes into events (including tool_use → ToolCall). |
| cmd/entire/cli/agent/claudecode/reviewer_test.go | Updates tests to validate stream-json fixture decoding and argv shape. |
| cmd/entire/cli/agent/claudecode/testdata/stream_session.jsonl | Adds captured claude stream-json fixture for parser contract tests. |
| cmd/entire/cli/agent/claudecode/testdata/canned_session.txt | Removes obsolete plain-text claude fixture (no longer used). |
| cmd/entire/cli/agent/claudecode/transcript.go | Reuses a shared assistant envelope type constant. |
| CLAUDE.md | Updates documentation to reflect per-agent parsing ownership and removal of shared codex filtering concerns. |
Address Tier A items from PR #1192 review: - Add TestParseClaudeOutput_StreamsEventsBeforeEOF mirroring codex's io.Pipe + timeout-gated streaming guard. Also exercises the previously-uncovered tool_use content block path. - Add TestParse{Claude,Codex}Output_NoTerminalEnvelopeMeansFailed for the truncated-stream → Finished{Success:false} branch. - Add TestParse{Claude,Codex}Output_GarbledLineEmitsRunErrorAndContinues to lock the recover-and-continue contract for per-line JSON errors. - Document codex's hard-coded Finished{Success:true} so a future schema addition that exposes turn-level errors gets wired through. - Soften the "codex 0.130.0" version-pin comment to read as a contract. - Replace the stale "no chrome filtering needed" comment in geminicli/reviewer.go with a description that doesn't reference deleted code. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: 4ab5685eff94
- Reuse messageUsage (claudecode/types.go) for the stream-json result envelope's Usage field instead of a duplicate claudeUsage struct, so the two consumers of Claude's API usage shape stay in lockstep (bugbot feedback on PR #1192). - Streaming-before-EOF tests for both agents: capture the expected event into a separate variable before type-asserting, so a failure message reflects the actual event type rather than the zero-valued asserted type (Copilot feedback on PR #1192). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Entire-Checkpoint: f516fe93f468
Contributor
Author
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 718f92e. Configure here.
Three follow-ups from review of this PR:
1. Bump the bufio.Scanner max line length from 16MB to 64MB in both
parsers. Codex packs the entire stdout of command_execution tools
into the aggregated_output field on item.completed envelopes
inline, so a chatty grep/cat/find over a moderately-sized repo can
put many MB into one envelope and hit the prior cap — surfacing as
"review failed" with no clue what tipped it over. Claude shares
the cap for parity; one buffer per active review, so memory cost
is modest. Comment on both sites explains the choice.
2. Default arm on codex's envelope-type switch logs unknown types at
Debug. The parser today handles thread.started, turn.started,
item.started, item.completed, turn.completed; anything else falls
through silently. If codex evolves (new tool item types,
intermediate envelope variants), drift is now triageable via
ENTIRE_LOG_LEVEL=debug instead of needing source-dive. Adds an
explicit no-op case for thread.started / turn.started so they
remain documented swallows rather than absorbed by the default.
3. Doc-comment polish on both parser functions:
- "Exposed for golden-file contract testing" → "Package-private;
called directly from this package's tests" (the parsers are
lowercase / unexported, "Exposed" misread on first glance).
- Add a line noting Tokens are emitted only at the terminal
envelope, not incrementally — the PR's headline "live events"
benefit applies to AssistantText/ToolCall, not Tokens, and
readers expecting per-message token streaming would be
surprised.
Note on review feedback #2 (missing codex garbled-line test): turned
out to be incorrect — TestParseCodexOutput_GarbledLineEmitsRunError
AndContinues already exists at codex/reviewer_test.go:346. Symmetry
is intact; no test added.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Entire-Checkpoint: 3dd5f29e5289
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://entire.io/gh/entireio/cli/trails/363
Summary
entire reviewmulti-agent TUI's Ctrl+O drill-in stayed at[started]and the dashboard preview column stayed empty for the entire run, because both claude-code (claude -p) and codex (codex exec -) buffer stdout under their default plain-text modes — events only arrived after the agent exited.--output-format stream-json --verbose, codex usesexec --json. Each stdout line is one JSON envelope; the parsers dispatch oneEventper envelope (assistant text, tool use, tokens, finished).AgentReviewerinterface changes, no TUI changes. The "extend parsers in place" boundary from the brief is preserved.Test Plan
mise run checkpasses (5158 unit tests + 340 integration tests + canary E2E)testdata/stream_session.jsonl,testdata/json_session.jsonl)TestParseCodexOutput_StreamsEventsBeforeEOFguards the actual streaming property (drips NDJSON throughio.Pipe, asserts events arrive before the pipe closes)entire review --agent claude-code: drill-in shows tool calls + assistant text streaming during the runentire review --agent codex: drill-in shows[tool: exec]rows + assistant text streaming during the run (codex has a ~30s startup before first event — model warmup, not a parser issue)entire review: both agents stream side-by-side; dashboard preview column updates mid-run for bothNotes
…viatruncateDisplayWidth; each event is exactly one row in the body.TestParseClaudeOutput_StreamsEventsBeforeEOFfor symmetry with codex; (b) per-tool argument summarization ineventLine(e.g., extractfile_pathfor Read instead of dumping the full JSON input).🤖 Generated with Claude Code
Note
Medium Risk
Switches the
claude-codeandcodexreview runners to new JSONL output modes and rewrites their parsers, which could affect live event streaming, tool-call rendering, and success/tokens detection across agent versions.Overview
entire reviewnow streams live events for claude-code and codex by switching both to their native JSONL stdout modes (claude --output-format stream-json --verbose,codex exec --json) and parsing each envelope intoEvents (AssistantText,ToolCall,Tokens,Finished).Codex’s prior stdout “chrome” filtering/state machine is removed entirely, replaced with direct JSON envelope decoding; both parsers now treat missing terminal envelopes (
result/turn.completed) asFinished{Success:false}and tolerate garbled lines by emittingRunErrorwhile continuing.Tests and fixtures are updated/added to assert argv shapes, JSONL decoding, and true streaming before EOF via
io.Pipe; integration/smoke stubs now emit minimalresultJSON to satisfy the new Claude parser, andclaudecode/transcript.goshares the assistant envelope type constant.Reviewed by Cursor Bugbot for commit 718f92e. Configure here.