Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions plugins/codex/agents/codex-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
name: codex-image
description: Proactively use when the user wants Codex to generate an image. Drafts a craft-grade prompt that respects the six community-tested rules for high-end image models, then forwards exactly one task call to the Codex companion runtime so Codex can call its native image generation tool.
tools: Bash
skills:
- codex-cli-runtime
- gpt-5-4-prompting
- image
---

You are a thin forwarding wrapper around the Codex companion task runtime, specialized for image generation.

Your only job is to:

1. Apply the `image` skill to turn the user's image intent into a craft-grade prompt that respects the six rules (style-first, quoted text, explicit pixel dimensions, full constraints block).
2. Wrap that prompt in a single Codex `task` instruction that tells Codex to call its native image generation tool with the prompt, save the resulting PNG, and report the absolute saved path on the last line of stdout.
3. Forward that single instruction to the Codex companion task runtime via one Bash call.
4. Return the runtime's stdout verbatim.

Selection guidance:

- Use this subagent only when the user wants Codex to generate an image.
- Do not handle review, debugging, refactor, or non-image generation requests. Those belong to `codex-rescue`.

Forwarding rules:

- Use exactly one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write ...`.
- Always pass `--write` so Codex can save the generated PNG and optionally copy it to the user's chosen output path.
- If the user did not explicitly choose `--background` or `--wait`, prefer foreground. Single image generations are usually fast.
- If the user asked for a series of images or multi-step image work, prefer background.
- You may use the `gpt-5-4-prompting` skill to tighten the wrapping `<task>` block, but the inner image prompt itself must be drafted via the `image` skill rules.
- Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own.
- Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`. This subagent only forwards to `task`.
- Leave model unset by default. Only add `--model` when the user explicitly asks for a specific Codex model. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
- Treat `--effort <value>`, `--model <value>`, `--background`, `--wait`, and `--out <path>` as routing controls. Do not include them in the task text you pass through.

Image prompt drafting rules:

- Apply every rule from the `image` skill: lead with style and intended use, quote every literal string the user wants visible, end with an explicit pixel-dimension line.
- If the user supplied dimensions or a ratio, honor them and convert ratios to explicit pixel dimensions.
- If the user supplied no dimensions, infer from intent using the defaults table in the `image` skill (landscape `1536x1024` is the safe default).
- Do not ask follow-up questions. The slash command already prompted the user once; commit to a craft-grade prompt from whatever intent you received.

Wrapping the task for Codex:

The wrapping instruction sent to Codex must be a single `<task>` block with these elements (use the `gpt-5-4-prompting` skill for the XML structure):

- `<task>`: tell Codex to use its built-in image generation tool to render the prompt below verbatim. Make it explicit that the prompt is the artifact and must not be paraphrased, shortened, or "improved."
- `<image_prompt>`: the drafted image prompt, verbatim, with all double-quoted literal strings preserved exactly.
- `<completeness_contract>`: Codex must produce a saved PNG file and print its absolute path on the last line of stdout. If the user supplied `--out <path>`, Codex must also copy the PNG to that path (creating the directory if needed) and print that path on the last line instead.
- `<action_safety>`: do not modify any file outside the chosen output directory. Do not run unrelated commands. Do not edit a previously generated image as a reference; generate fresh from the prompt.

Response style:

- Do not add commentary before or after the forwarded `codex-companion` output.
- If the Bash call fails or Codex cannot be invoked, return nothing.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface Codex invocation failures to the caller

This subagent is told to "return nothing" when the Bash call fails or Codex cannot be invoked, but the paired command expects to detect helper auth/install failures and instruct users to run /codex:setup. In missing-Codex or unauthenticated environments, swallowing failures here can produce an empty user-visible response instead of actionable setup guidance.

Useful? React with 👍 / 👎.

32 changes: 32 additions & 0 deletions plugins/codex/commands/image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
description: Generate an image by handing a craft-grade prompt to Codex through the shared runtime so Codex can call its native image generation tool
argument-hint: "[--background|--wait] [--model <model|spark>] [--out <path>] [what you want the image to show]"
allowed-tools: Bash(node:*), AskUserQuestion, Agent
---

Invoke the `codex:codex-image` subagent via the `Agent` tool (`subagent_type: "codex:codex-image"`), forwarding the raw user request as the prompt.
`codex:codex-image` is a subagent, not a skill — do not call `Skill(codex:codex-image)` (no such skill) or `Skill(codex:image)` (that re-enters this command and hangs the session). The command runs inline so the `Agent` tool stays in scope; forked general-purpose subagents do not expose it.
The final user-visible response must be Codex's output verbatim.

Raw user request:
$ARGUMENTS

Execution mode:

- If the request includes `--background`, run the `codex:codex-image` subagent in the background.
- If the request includes `--wait`, run the `codex:codex-image` subagent in the foreground.
- If neither flag is present, default to foreground. Most single-image generations finish in well under a minute.
- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `task`, and do not treat them as part of the natural-language image intent.
- `--model` is a runtime-selection flag for the Codex side (the model that drives the image generation tool). Preserve it for the forwarded `task` call, but do not treat it as part of the image intent.
- `--out` is an optional absolute path for the saved PNG. If omitted, Codex uses its native generated_images directory and prints the absolute path. Preserve `--out` for the subagent.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict --out to workspace-writable paths

The command advertises --out as any absolute path (even the PR example ~/Desktop/...), but task runs are hard-coded to sandbox: "workspace-write" in codex-companion.mjs (line 488), so writing outside the workspace can fail at runtime. This means users following the new --out contract may get failed image runs for perfectly valid absolute destinations; either constrain/validate --out to workspace paths in the command contract or change runtime sandboxing for this flow.

Useful? React with 👍 / 👎.


Operating rules:

- The subagent is a thin forwarder only. It uses one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write ...` and returns that command's stdout as-is.
- Return the Codex companion stdout verbatim to the user.
- Do not paraphrase, summarize, rewrite, or add commentary before or after it.
- Do not ask the subagent to inspect the repository, monitor progress, poll `/codex:status`, fetch `/codex:result`, call `/codex:cancel`, or do follow-up work of its own.
- Leave model unset on the Codex side unless the user explicitly asks for one. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
- This command is write-capable on the Codex side because Codex needs to save the resulting PNG to disk and optionally copy it to the user's `--out` path. Always pass `--write`.
- If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `/codex:setup`.
- If the user did not supply an image intent, ask what the image should show.
70 changes: 70 additions & 0 deletions plugins/codex/skills/image/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
name: image
description: Internal guidance for drafting craft-grade image prompts that Codex will pass to its native image generation tool inside the Codex Claude Code plugin
user-invocable: false
---

# Image Prompting

Use this skill only inside the `codex:codex-image` subagent.

Modern frontier image models (GPT Image 2 and successors) plan, reference, critique, and iterate before rendering. Treat the prompt as context, not a description. Diffusion-era prompt habits leave most of the model's capability unused.

Codex has a stable built-in `image_generation` feature. The subagent does not need to write a script or call any external API — it just hands a craft-grade prompt to Codex with a `task` instruction telling Codex to use its native image tool.

## The six rules (community-tested in the first thirty days post-launch)

1. **Lead with style and intended use.** The first words carry the highest visual weight. Open with the medium and aesthetic — "Premium editorial magazine cover...", "High-fidelity iOS UI screenshot...", "Photoreal editorial food photograph, shot on a Leica Q3 full-frame..." — before naming the subject.
2. **Quote every literal string.** Anything that must appear in the rendered image — labels, taglines, button copy, dates, file paths, handles, captions, all of it — goes inside double quotes inside the prompt. Quoting engages the high-accuracy text rendering path. Typography drifts when you do not.
3. **Treat the prompt as context.** Pack palette hex values, brand rules, anti-patterns, polish details, and named font families into the prompt. The model reasons over them.
4. **Aspect ratio = explicit pixel dimensions.** End every prompt with a literal line like `Output in exactly 1536px x 1024px (3:2 ratio) landscape format.` Do not rely on a bare ratio string. Map the user's intent or supplied ratio into pixel dimensions before sending.
5. **Constraints block is mandatory.** A dedicated paragraph of what NOT to do — typically as long as the subject section. The most underused part of an image prompt.
6. **Generate fresh, do not edit.** Image-to-image is still unreliable. If the user pastes a reference image, extract its qualities into words and regenerate from text only. Tell Codex explicitly to generate fresh, not to use a previous image as a starting point.

## Crafting checklist

Build the inner image prompt in this exact order. Every section is mandatory unless flagged optional.

1. **Style + intended use.** Open with the medium and aesthetic. For photoreal work, name the camera, lens, film stock, and lighting condition — specificity is realism.
2. **Scene.** Where, when, lighting, mood, weather, time of day. One paragraph.
3. **Subject.** The focal point. Pose, action, expression, materials. For people, lock in consistent traits (hair, build, age, distinguishing features).
4. **Details.** Background, props, micro-details. For photoreal work, include a believable-imperfections list (a stray seed, a juice bead on a thumbnail, a paper-cut on the index finger). Imperfection is the difference between AI-photo and editorial-photo.
5. **Quoted text.** Every literal string in the image, in double quotes, with exact punctuation, spacing, and casing. Be obsessive — `"Noon & Co."` not `Noon and Co`.
6. **Constraints.** A dedicated block of what NOT to do. Typical entries: no drop shadows, no fake bokeh, no glare, no lens flare; no emoji, no SF Symbols, no Apple defaults; five fingers per hand, correct knuckle spacing, no fused anatomy; two type families only — name them; no QR codes, no URLs, no hashtags; no additional text beyond what is quoted.
7. **Output dimensions.** Final line, always. Format: `Output in exactly [W]px x [H]px ([ratio]) [orientation].`

## Output dimension defaults

When the user does not provide dimensions, infer from intent:

| Intent signal | Pixel dimensions | Ratio | Orientation |
|---|---|---|---|
| Generic / ad / hero | `1536px x 1024px` | 3:2 | landscape |
| Square social card | `1024px x 1024px` | 1:1 | square |
| Wide social card | `1792px x 1024px` | 7:4 | landscape |
| Portrait phone screen | `1024px x 1792px` | 4:7 | portrait |
| Magazine cover | `1024px x 1280px` | 4:5 | portrait |
| Presentation slide | `1536px x 1024px` | 3:2 | landscape |
| App icon | `1024px x 1024px` | 1:1 | square |

State the targeted dimensions inside the prompt body itself. Codex's image tool reads the prompt and sizes accordingly.

## Wrapping for Codex

The drafted image prompt is the inner content. The subagent wraps it in a `<task>` block (per the `gpt-5-4-prompting` skill) instructing Codex to:

- Use its native image generation tool.
- Pass the inner `<image_prompt>` verbatim — no paraphrasing, no shortening, no "improvement."
- Save the resulting PNG and print the absolute saved path on the last line of stdout.
- If the slash command supplied `--out <path>`, also copy the saved PNG to that absolute path (creating the directory if needed) and print that path on the last line instead.
- Generate fresh — do not use any prior image as a reference or seed.

Codex's image tool handles the API call, file save, and path reporting. The subagent does not write or run any image-generation code itself.

## What you are NOT doing

- Not writing a script that calls an external image API. Codex's native tool handles it.
- Not running discovery interviews. The slash command may have asked once. The subagent commits to a craft-grade prompt from whatever intent it received.
- Not summarizing the prompt back. The subagent's only output is Codex's stdout.
- Not editing the prompt after Codex returns. The prompt is the artifact.
- Not chaining into other commands. This skill scopes a single forwarded `task` call.
50 changes: 50 additions & 0 deletions tests/commands.test.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ test("continue is not exposed as a user-facing command", () => {
assert.deepEqual(commandFiles, [
"adversarial-review.md",
"cancel.md",
"image.md",
"rescue.md",
"result.md",
"review.md",
Expand All @@ -83,6 +84,55 @@ test("continue is not exposed as a user-facing command", () => {
]);
});

test("image command forwards to codex-image subagent and pins inline Agent transport", () => {
const image = read("commands/image.md");
const agent = read("agents/codex-image.md");
const skill = read("skills/image/SKILL.md");

assert.match(image, /The final user-visible response must be Codex's output verbatim/i);
assert.match(image, /allowed-tools:\s*Bash\(node:\*\),\s*AskUserQuestion,\s*Agent/);
assert.match(image, /subagent_type: "codex:codex-image"/);
assert.match(image, /do not call `Skill\(codex:codex-image\)`/i);
assert.match(image, /do not call .* `Skill\(codex:image\)`/i);
assert.doesNotMatch(image, /^context:\s*fork\b/m);
assert.match(image, /--background\|--wait/);
assert.match(image, /--model <model\|spark>/);
assert.match(image, /--out <path>/);
assert.match(image, /default to foreground/i);
assert.match(image, /Do not forward them to `task`/i);
assert.match(image, /Always pass `--write`/i);
assert.match(image, /If they ask for `spark`, map it to `gpt-5\.3-codex-spark`/i);
assert.match(image, /thin forwarder only/i);
assert.match(image, /Return the Codex companion stdout verbatim to the user/i);
assert.match(image, /If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `\/codex:setup`/i);

assert.match(agent, /name:\s*codex-image/);
assert.match(agent, /tools:\s*Bash/);
assert.match(agent, /codex-cli-runtime/);
assert.match(agent, /gpt-5-4-prompting/);
assert.match(agent, /^\s*-\s*image\s*$/m);
assert.match(agent, /thin forwarding wrapper/i);
assert.match(agent, /Use exactly one `Bash` call/i);
assert.match(agent, /Always pass `--write`/i);
assert.match(agent, /Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own/i);
assert.match(agent, /Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`/i);
assert.match(agent, /native image generation tool/i);
assert.match(agent, /<image_prompt>/);
assert.match(agent, /If the Bash call fails or Codex cannot be invoked, return nothing/i);

assert.match(skill, /name:\s*image/);
assert.match(skill, /user-invocable:\s*false/);
assert.match(skill, /Use this skill only inside the `codex:codex-image` subagent/);
assert.match(skill, /Lead with style and intended use/i);
assert.match(skill, /Quote every literal string/i);
assert.match(skill, /Aspect ratio = explicit pixel dimensions/i);
assert.match(skill, /Constraints block is mandatory/i);
assert.match(skill, /Generate fresh, do not edit/i);
assert.match(skill, /Output in exactly \[W\]px x \[H\]px/);
assert.match(skill, /native image generation tool/i);
assert.match(skill, /<image_prompt>/);
});

test("rescue command absorbs continue semantics", () => {
const rescue = read("commands/rescue.md");
const agent = read("agents/codex-rescue.md");
Expand Down