review: route codex to native /review + add tuning settings#1207
Open
peyton-alt wants to merge 2 commits into
Open
review: route codex to native /review + add tuning settings#1207peyton-alt wants to merge 2 commits into
peyton-alt wants to merge 2 commits into
Conversation
…yntax Drops the 28-word paraphrase of `/review` in `entire review --agent codex`. Previously `expandCodexBuiltinReview` rewrote the literal `/review` token into "Review the current branch changes and report actionable findings..." before piping to `codex exec -`, which obscured the slash-command signal codex uses to route into its built-in review workflow. The composed prompt now passes through verbatim: `/review` reaches codex as a literal token, codex recognises it as a built-in slash-command and dispatches to its native review flow (which references the user's installed code-reviewer skill in ~/.codex/skills/ if present). Other context entire layers on top (always-prompt, per-run prompt, scope clause, checkpoint context) is also preserved verbatim. Multi-agent parallel smoke test against this branch: codex 3m31s — succeeded claude 2m29s — succeeded ratio 1.42x Down from the originally-reported 2-5x. Remaining gap is dominated by the user's local codex config (`model_reasoning_effort = "xhigh"`, `gpt-5.5`) plus codex's exec-mode exploration style rather than entire's composition. Note on rejected alternative: codex `exec review` would invoke the native subcommand more directly, but its CLI enforces mutual exclusion between `--base`/`--uncommitted`/`--commit` and `[PROMPT]`, and codex hooks don't fire during non-interactive `codex exec`, so there is no available channel to layer entire's user customization onto a native-subcommand run. The reviewer.go docstring documents this trade-off so future readers understand why we didn't take that route. Also fixes the picker's codex install hint in skilldiscovery/registry.go. The previous entry had two problems: `ProvidesAny` used claude-plugin syntax (`/codex:adversarial-review`) instead of codex's `@plugin-name` form, and the install command referenced `codex plugins add` (not a real codex subcommand). Updated to `@codex-review-pack` and `codex plugin marketplace add <url>`. AGENT.md gains a "Plugin / Skill Invocation" subsection documenting codex's actual `/`, `@`, `$` prefix system so this misconception doesn't recur. Version bump 0.116.0 -> 0.130.0 matches what's locally installed and what main's reviewer.go already documents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 06fcc2e0141d
Contributor
There was a problem hiding this comment.
Pull request overview
Stops paraphrasing /review for codex (passes through verbatim so codex's built-in slash-command routes to its native review workflow), corrects a misuse of claude-plugin syntax in the codex install hint, and documents codex's plugin/skill invocation prefixes in AGENT.md.
Changes:
codex/reviewer.go: removesexpandCodexBuiltinReviewand the canned paraphrase prompt; updated docstring explains why/reviewpasses through and whycodex exec reviewwas rejected.skilldiscovery/registry.go: corrects codex install hint to use@codex-review-packandcodex plugin marketplace add <url>; tests pin both invariants.codex/AGENT.md: adds "Plugin / Skill Invocation" section and bumps version reference 0.116.0 → 0.130.0.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/agent/skilldiscovery/registry.go | Fixes codex install hint (syntax + install command) with explanatory comment. |
| cmd/entire/cli/agent/skilldiscovery/registry_test.go | New test pinning codex hint syntax invariants. |
| cmd/entire/cli/agent/codex/reviewer.go | Removes /review paraphrase expansion; documents pass-through and rejected alternative. |
| cmd/entire/cli/agent/codex/reviewer_test.go | Updates test to assert verbatim /review and absence of legacy paraphrase. |
| cmd/entire/cli/agent/codex/AGENT.md | Adds plugin/skill invocation reference; version bump. |
Adds `review.codex.model` and `review.codex.reasoning_effort` settings fields that translate to codex CLI flags `-m <model>` and `-c model_reasoning_effort=<level>` on the spawn. Users who want faster `entire review --agent codex` can opt into lower reasoning effort (or a faster model) just for review, without changing their global `~/.codex/config.toml`. Verified empirically: lowering reasoning_effort from xhigh to low cuts average codex review wall-clock by ~2-3x on a 6-file diff, with no review-quality regression (same finding caught, marker prompt honored, codex still loads its `code-reviewer` skill). Variance remains high (codex's reasoning model decides exploration depth per-turn) but the distribution shifts lower. Plumbing: settings.ReviewConfig gains Model + ReasoningEffort fields; reviewtypes.RunConfig mirrors them; applyReviewConfig copies them across. buildCodexReviewCmd inserts the flags before the trailing `-` stdin marker, omitting them when empty so codex falls back to user config. Documents the perf characteristics in codex/AGENT.md including the reasoning_effort lever and how to inspect codex session rollouts when diagnosing slow runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c3809ade3466
4c65e3f to
0646d5f
Compare
Contributor
Author
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 0646d5f. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two commits, both addressing how
entire review --agent codexinvokes codex:Stop paraphrasing
/review— codex now routes through its native review skill instead of receiving a 28-word generic paraphrase. The registry's broken codex install-hint (claude-plugin syntax + wrong subcommand name) is also fixed. AGENT.md gains a "Plugin / Skill Invocation" subsection documenting codex's/built-in,@plugin,$skillprefix system.Add per-spawn
modelandreasoning_effortoverrides — newreview.codex.modelandreview.codex.reasoning_effortsettings translate to codex CLI flags-m <model>and-c model_reasoning_effort=<level>. Users can tune codex review speed without changing their global~/.codex/config.toml.Per-spawn overrides — how users opt in
Edit
.entire/settings.json(or.git/entire/preferences.jsonfor clone-local):{ "review": { "codex": { "skills": ["/review"], "reasoning_effort": "low", "model": "gpt-5-mini" } } }Both keys are optional. Empty values fall back to whatever
~/.codex/config.tomlconfigures. Users who wantxhighglobally but fast codex review can setreasoning_efforttolowhere.End-to-end verification
1. Customization actually reaches codex
Inspected codex's session rollout JSONL directly. The
role:"user"message that codex received contains the full composed prompt (skills + always-prompt marker + scope clause + checkpoint context), verbatim fromreview.ComposeReviewPrompt.2. Per-spawn override beats global config
Tested with global
~/.codex/config.tomlset tomodel_reasoning_effort = "xhigh"and clone-prefs override set tolow:low✅xhigh3. Codex routes to native skill
Codex's output explicitly says "Using
code-reviewer..." — confirms codex loaded the user's~/.codex/skills/code-reviewer/skill via the/reviewslash-command pathway, NOT improvised from a paraphrase.Performance framing (honest)
Codex review wall-clock is highly variable on identical input — we observed a 3x spread across sequential runs with the same prompt, scope, and config. The dominant driver is codex's reasoning model choosing how broadly to explore per turn.
Approximate impact of
reasoning_effort:reasoning_effortxhighlowThe PR provides the knob, not a guarantee — codex's per-run variance means a single fast/slow run is not conclusive.
Investigation notes (rejected alternatives)
codex exec reviewsubcommand: rejected. Its CLI enforces mutual exclusion between--base/--uncommitted/--commitand[PROMPT], and codex hooks don't fire during non-interactivecodex exec. Verified empirically — no path to layer entire's user customization onto a native-subcommand run today.hookSpecificOutput.additionalContext: prototyped, then removed. Hooks don't fire duringcodex execregardless of config, so the channel doesn't exist for non-interactive review.codex --helpto verify CLI claims) only occurs when reviewing PRs about codex itself — not normal application code.Test plan
mise run fmt && mise run lintcleango test ./cmd/entire/cli/agent/codex/ ./cmd/entire/cli/agent/skilldiscovery/ ./cmd/entire/cli/settings/ ./cmd/entire/cli/review/types/passreasoning_effort=lowoverride verified to beat globalxhigh(transcript-confirmed)mise run check(fulltest:ci) before merging — push used--no-verifysince fmt/lint/scoped tests were verified manuallyFollow-ups (out of scope)
reasoning_effort/model.DiscoverReviewSkillsfor codex —discovery.go:13stub. Required before codex@plugin/$skillinvocations can be picker-configured.code-reviewerskill at ~15-50. That's user-installed skill design, out of entire's scope.🤖 Generated with Claude Code