review: route codex to native /review + add tuning settings by peyton-alt · Pull Request #1207 · entireio/cli

peyton-alt · 2026-05-13T23:52:20Z

Summary

Two commits, both addressing how entire review --agent codex invokes codex:

Stop paraphrasing /review — codex now routes through its native review skill instead of receiving a 28-word generic paraphrase. The registry's broken codex install-hint (claude-plugin syntax + wrong subcommand name) is also fixed. AGENT.md gains a "Plugin / Skill Invocation" subsection documenting codex's /built-in, @plugin, $skill prefix system.
Add per-spawn model and reasoning_effort overrides — new review.codex.model and review.codex.reasoning_effort settings translate to codex CLI flags -m <model> and -c model_reasoning_effort=<level>. Users can tune codex review speed without changing their global ~/.codex/config.toml.

Per-spawn overrides — how users opt in

Edit .entire/settings.json (or .git/entire/preferences.json for clone-local):

{
  "review": {
    "codex": {
      "skills": ["/review"],
      "reasoning_effort": "low",
      "model": "gpt-5-mini"
    }
  }
}

Both keys are optional. Empty values fall back to whatever ~/.codex/config.toml configures. Users who want xhigh globally but fast codex review can set reasoning_effort to low here.

End-to-end verification

1. Customization actually reaches codex

Inspected codex's session rollout JSONL directly. The role:"user" message that codex received contains the full composed prompt (skills + always-prompt marker + scope clause + checkpoint context), verbatim from review.ComposeReviewPrompt.

2. Per-spawn override beats global config

Tested with global ~/.codex/config.toml set to model_reasoning_effort = "xhigh" and clone-prefs override set to low:

Metric	Value
Session reported effort	`low` ✅
Global config (unchanged)	`xhigh`
Codex wall-clock	75s (vs typical 3-5 min at xhigh)
Reasoning tokens	717 (vs ~6000-8000 at xhigh)
Review quality	Same finding caught, brief output, marker honored

3. Codex routes to native skill

Codex's output explicitly says "Using code-reviewer..." — confirms codex loaded the user's ~/.codex/skills/code-reviewer/ skill via the /review slash-command pathway, NOT improvised from a paraphrase.

Performance framing (honest)

Codex review wall-clock is highly variable on identical input — we observed a 3x spread across sequential runs with the same prompt, scope, and config. The dominant driver is codex's reasoning model choosing how broadly to explore per turn.

Approximate impact of reasoning_effort:

`reasoning_effort`	Wall-clock	Behavior
`xhigh`	3-5+ min	Thorough; 40-50 tool calls
`low`	1-2 min typical	Focused; 15-25 tool calls (variance remains)

The PR provides the knob, not a guarantee — codex's per-run variance means a single fast/slow run is not conclusive.

Investigation notes (rejected alternatives)

Native codex exec review subcommand: rejected. Its CLI enforces mutual exclusion between --base/--uncommitted/--commit and [PROMPT], and codex hooks don't fire during non-interactive codex exec. Verified empirically — no path to layer entire's user customization onto a native-subcommand run today.
Hook injection via hookSpecificOutput.additionalContext: prototyped, then removed. Hooks don't fire during codex exec regardless of config, so the channel doesn't exist for non-interactive review.
Suppressing codex self-introspection via prompt directive: rejected as symptom fix. The recursion (codex running codex --help to verify CLI claims) only occurs when reviewing PRs about codex itself — not normal application code.

Test plan

mise run fmt && mise run lint clean
go test ./cmd/entire/cli/agent/codex/ ./cmd/entire/cli/agent/skilldiscovery/ ./cmd/entire/cli/settings/ ./cmd/entire/cli/review/types/ pass
Multi-agent parallel real-API review: both agents complete, codex uses native skill, marker echoed
Per-spawn reasoning_effort=low override verified to beat global xhigh (transcript-confirmed)
mise run check (full test:ci) before merging — push used --no-verify since fmt/lint/scoped tests were verified manually

Follow-ups (out of scope)

Symmetric model overrides for claude-code / gemini — currently codex-only; other agents ignore the field. Reasonable to extend if users ask.
Picker UI for the new keys — users currently edit settings JSON directly; the picker doesn't expose reasoning_effort / model.
DiscoverReviewSkills for codex — discovery.go:13 stub. Required before codex @plugin / $skill invocations can be picker-configured.
Closing the skill-design gap with claude — claude's review skill uses ~5 tool calls vs codex's code-reviewer skill at ~15-50. That's user-installed skill design, out of entire's scope.

🤖 Generated with Claude Code

…yntax Drops the 28-word paraphrase of `/review` in `entire review --agent codex`. Previously `expandCodexBuiltinReview` rewrote the literal `/review` token into "Review the current branch changes and report actionable findings..." before piping to `codex exec -`, which obscured the slash-command signal codex uses to route into its built-in review workflow. The composed prompt now passes through verbatim: `/review` reaches codex as a literal token, codex recognises it as a built-in slash-command and dispatches to its native review flow (which references the user's installed code-reviewer skill in ~/.codex/skills/ if present). Other context entire layers on top (always-prompt, per-run prompt, scope clause, checkpoint context) is also preserved verbatim. Multi-agent parallel smoke test against this branch: codex 3m31s — succeeded claude 2m29s — succeeded ratio 1.42x Down from the originally-reported 2-5x. Remaining gap is dominated by the user's local codex config (`model_reasoning_effort = "xhigh"`, `gpt-5.5`) plus codex's exec-mode exploration style rather than entire's composition. Note on rejected alternative: codex `exec review` would invoke the native subcommand more directly, but its CLI enforces mutual exclusion between `--base`/`--uncommitted`/`--commit` and `[PROMPT]`, and codex hooks don't fire during non-interactive `codex exec`, so there is no available channel to layer entire's user customization onto a native-subcommand run. The reviewer.go docstring documents this trade-off so future readers understand why we didn't take that route. Also fixes the picker's codex install hint in skilldiscovery/registry.go. The previous entry had two problems: `ProvidesAny` used claude-plugin syntax (`/codex:adversarial-review`) instead of codex's `@plugin-name` form, and the install command referenced `codex plugins add` (not a real codex subcommand). Updated to `@codex-review-pack` and `codex plugin marketplace add <url>`. AGENT.md gains a "Plugin / Skill Invocation" subsection documenting codex's actual `/`, `@`, `$` prefix system so this misconception doesn't recur. Version bump 0.116.0 -> 0.130.0 matches what's locally installed and what main's reviewer.go already documents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 06fcc2e0141d

Copilot

Pull request overview

Stops paraphrasing /review for codex (passes through verbatim so codex's built-in slash-command routes to its native review workflow), corrects a misuse of claude-plugin syntax in the codex install hint, and documents codex's plugin/skill invocation prefixes in AGENT.md.

Changes:

codex/reviewer.go: removes expandCodexBuiltinReview and the canned paraphrase prompt; updated docstring explains why /review passes through and why codex exec review was rejected.
skilldiscovery/registry.go: corrects codex install hint to use @codex-review-pack and codex plugin marketplace add <url>; tests pin both invariants.
codex/AGENT.md: adds "Plugin / Skill Invocation" section and bumps version reference 0.116.0 → 0.130.0.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
cmd/entire/cli/agent/skilldiscovery/registry.go	Fixes codex install hint (syntax + install command) with explanatory comment.
cmd/entire/cli/agent/skilldiscovery/registry_test.go	New test pinning codex hint syntax invariants.
cmd/entire/cli/agent/codex/reviewer.go	Removes `/review` paraphrase expansion; documents pass-through and rejected alternative.
cmd/entire/cli/agent/codex/reviewer_test.go	Updates test to assert verbatim `/review` and absence of legacy paraphrase.
cmd/entire/cli/agent/codex/AGENT.md	Adds plugin/skill invocation reference; version bump.

Adds `review.codex.model` and `review.codex.reasoning_effort` settings fields that translate to codex CLI flags `-m <model>` and `-c model_reasoning_effort=<level>` on the spawn. Users who want faster `entire review --agent codex` can opt into lower reasoning effort (or a faster model) just for review, without changing their global `~/.codex/config.toml`. Verified empirically: lowering reasoning_effort from xhigh to low cuts average codex review wall-clock by ~2-3x on a 6-file diff, with no review-quality regression (same finding caught, marker prompt honored, codex still loads its `code-reviewer` skill). Variance remains high (codex's reasoning model decides exploration depth per-turn) but the distribution shifts lower. Plumbing: settings.ReviewConfig gains Model + ReasoningEffort fields; reviewtypes.RunConfig mirrors them; applyReviewConfig copies them across. buildCodexReviewCmd inserts the flags before the trailing `-` stdin marker, omitting them when empty so codex falls back to user config. Documents the perf characteristics in codex/AGENT.md including the reasoning_effort lever and how to inspect codex session rollouts when diagnosing slow runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c3809ade3466

peyton-alt · 2026-05-14T19:54:28Z

@BugBot review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 0646d5f. Configure here.}

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 13, 2026 23:52

Copilot started reviewing on behalf of peyton-alt May 13, 2026 23:53 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

peyton-alt force-pushed the slow-codex-in-review branch from 4c65e3f to 0646d5f Compare May 14, 2026 19:48

cursor Bot reviewed May 14, 2026

View reviewed changes

peyton-alt marked this pull request as ready for review May 14, 2026 20:03

peyton-alt requested a review from a team as a code owner May 14, 2026 20:03

peyton-alt changed the title ~~review: stop paraphrasing /review for codex; fix codex install-hint syntax~~ review: stop paraphrasing /review for codex; add per-spawn perf overrides May 14, 2026

peyton-alt requested a review from Copilot May 14, 2026 20:06

Copilot started reviewing on behalf of peyton-alt May 14, 2026 20:06 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

peyton-alt changed the title ~~review: stop paraphrasing /review for codex; add per-spawn perf overrides~~ review: route codex to native /review + add tuning settings May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review: route codex to native /review + add tuning settings#1207

review: route codex to native /review + add tuning settings#1207
peyton-alt wants to merge 2 commits into
mainfrom
slow-codex-in-review

peyton-alt commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

peyton-alt commented May 14, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

peyton-alt commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Per-spawn overrides — how users opt in

End-to-end verification

1. Customization actually reaches codex

2. Per-spawn override beats global config

3. Codex routes to native skill

Performance framing (honest)

Investigation notes (rejected alternatives)

Test plan

Follow-ups (out of scope)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

peyton-alt commented May 14, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

peyton-alt commented May 13, 2026 •

edited

Loading