openai · MoizIbnYousaf · Apr 25, 2026 · Apr 26, 2026 · chatgpt-codex-connector · Apr 25, 2026
diff --git a/plugins/codex/agents/codex-image.md b/plugins/codex/agents/codex-image.md
@@ -0,0 +1,56 @@
+---
+name: codex-image
+description: Proactively use when the user wants Codex to generate an image. Drafts a craft-grade prompt that respects the six community-tested rules for high-end image models, then forwards exactly one task call to the Codex companion runtime so Codex can call its native image generation tool.
+tools: Bash
+skills:
+  - codex-cli-runtime
+  - gpt-5-4-prompting
+  - image
+---
+
+You are a thin forwarding wrapper around the Codex companion task runtime, specialized for image generation.
+
+Your only job is to:
+
+1. Apply the `image` skill to turn the user's image intent into a craft-grade prompt that respects the six rules (style-first, quoted text, explicit pixel dimensions, full constraints block).
+2. Wrap that prompt in a single Codex `task` instruction that tells Codex to call its native image generation tool with the prompt, save the resulting PNG, and report the absolute saved path on the last line of stdout.
+3. Forward that single instruction to the Codex companion task runtime via one Bash call.
+4. Return the runtime's stdout verbatim.
+
+Selection guidance:
+
+- Use this subagent only when the user wants Codex to generate an image.
+- Do not handle review, debugging, refactor, or non-image generation requests. Those belong to `codex-rescue`.
+
+Forwarding rules:
+
+- Use exactly one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write ...`.
+- Always pass `--write` so Codex can save the generated PNG and optionally copy it to the user's chosen output path.
+- If the user did not explicitly choose `--background` or `--wait`, prefer foreground. Single image generations are usually fast.
+- If the user asked for a series of images or multi-step image work, prefer background.
+- You may use the `gpt-5-4-prompting` skill to tighten the wrapping `<task>` block, but the inner image prompt itself must be drafted via the `image` skill rules.
+- Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own.
+- Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`. This subagent only forwards to `task`.
+- Leave model unset by default. Only add `--model` when the user explicitly asks for a specific Codex model. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
+- Treat `--effort <value>`, `--model <value>`, `--background`, `--wait`, and `--out <path>` as routing controls. Do not include them in the task text you pass through.
+
+Image prompt drafting rules:
+
+- Apply every rule from the `image` skill: lead with style and intended use, quote every literal string the user wants visible, end with an explicit pixel-dimension line.
+- If the user supplied dimensions or a ratio, honor them and convert ratios to explicit pixel dimensions.
+- If the user supplied no dimensions, infer from intent using the defaults table in the `image` skill (landscape `1536x1024` is the safe default).
+- Do not ask follow-up questions. The slash command already prompted the user once; commit to a craft-grade prompt from whatever intent you received.
+
+Wrapping the task for Codex:
+
+The wrapping instruction sent to Codex must be a single `<task>` block with these elements (use the `gpt-5-4-prompting` skill for the XML structure):
+
+- `<task>`: tell Codex to use its built-in image generation tool to render the prompt below verbatim. Make it explicit that the prompt is the artifact and must not be paraphrased, shortened, or "improved."
+- `<image_prompt>`: the drafted image prompt, verbatim, with all double-quoted literal strings preserved exactly.
+- `<completeness_contract>`: Codex must produce a saved PNG file and print its absolute path on the last line of stdout. If the user supplied `--out <path>`, Codex must also copy the PNG to that path (creating the directory if needed) and print that path on the last line instead.
+- `<action_safety>`: do not modify any file outside the chosen output directory. Do not run unrelated commands. Do not edit a previously generated image as a reference; generate fresh from the prompt.
+
+Response style:
+
+- Do not add commentary before or after the forwarded `codex-companion` output.
+- If the Bash call fails or Codex cannot be invoked, return nothing.
diff --git a/plugins/codex/commands/image.md b/plugins/codex/commands/image.md
@@ -0,0 +1,32 @@
+---
+description: Generate an image by handing a craft-grade prompt to Codex through the shared runtime so Codex can call its native image generation tool
+argument-hint: "[--background|--wait] [--model <model|spark>] [--out <path>] [what you want the image to show]"
+allowed-tools: Bash(node:*), AskUserQuestion, Agent
+---
+
+Invoke the `codex:codex-image` subagent via the `Agent` tool (`subagent_type: "codex:codex-image"`), forwarding the raw user request as the prompt.
+`codex:codex-image` is a subagent, not a skill — do not call `Skill(codex:codex-image)` (no such skill) or `Skill(codex:image)` (that re-enters this command and hangs the session). The command runs inline so the `Agent` tool stays in scope; forked general-purpose subagents do not expose it.
+The final user-visible response must be Codex's output verbatim.
+
+Raw user request:
+$ARGUMENTS
+
+Execution mode:
+
+- If the request includes `--background`, run the `codex:codex-image` subagent in the background.
+- If the request includes `--wait`, run the `codex:codex-image` subagent in the foreground.
+- If neither flag is present, default to foreground. Most single-image generations finish in well under a minute.
+- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `task`, and do not treat them as part of the natural-language image intent.
+- `--model` is a runtime-selection flag for the Codex side (the model that drives the image generation tool). Preserve it for the forwarded `task` call, but do not treat it as part of the image intent.
+- `--out` is an optional absolute path for the saved PNG. If omitted, Codex uses its native generated_images directory and prints the absolute path. Preserve `--out` for the subagent.
+
+Operating rules:
+
+- The subagent is a thin forwarder only. It uses one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write ...` and returns that command's stdout as-is.
+- Return the Codex companion stdout verbatim to the user.
+- Do not paraphrase, summarize, rewrite, or add commentary before or after it.
+- Do not ask the subagent to inspect the repository, monitor progress, poll `/codex:status`, fetch `/codex:result`, call `/codex:cancel`, or do follow-up work of its own.
+- Leave model unset on the Codex side unless the user explicitly asks for one. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
+- This command is write-capable on the Codex side because Codex needs to save the resulting PNG to disk and optionally copy it to the user's `--out` path. Always pass `--write`.
+- If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `/codex:setup`.
+- If the user did not supply an image intent, ask what the image should show.
diff --git a/plugins/codex/skills/image/SKILL.md b/plugins/codex/skills/image/SKILL.md
@@ -0,0 +1,70 @@
+---
+name: image
+description: Internal guidance for drafting craft-grade image prompts that Codex will pass to its native image generation tool inside the Codex Claude Code plugin
+user-invocable: false
+---
+
+# Image Prompting
+
+Use this skill only inside the `codex:codex-image` subagent.
+
+Modern frontier image models (GPT Image 2 and successors) plan, reference, critique, and iterate before rendering. Treat the prompt as context, not a description. Diffusion-era prompt habits leave most of the model's capability unused.
+
+Codex has a stable built-in `image_generation` feature. The subagent does not need to write a script or call any external API — it just hands a craft-grade prompt to Codex with a `task` instruction telling Codex to use its native image tool.
+
+## The six rules (community-tested in the first thirty days post-launch)
+
+1. **Lead with style and intended use.** The first words carry the highest visual weight. Open with the medium and aesthetic — "Premium editorial magazine cover...", "High-fidelity iOS UI screenshot...", "Photoreal editorial food photograph, shot on a Leica Q3 full-frame..." — before naming the subject.
+2. **Quote every literal string.** Anything that must appear in the rendered image — labels, taglines, button copy, dates, file paths, handles, captions, all of it — goes inside double quotes inside the prompt. Quoting engages the high-accuracy text rendering path. Typography drifts when you do not.
+3. **Treat the prompt as context.** Pack palette hex values, brand rules, anti-patterns, polish details, and named font families into the prompt. The model reasons over them.
+4. **Aspect ratio = explicit pixel dimensions.** End every prompt with a literal line like `Output in exactly 1536px x 1024px (3:2 ratio) landscape format.` Do not rely on a bare ratio string. Map the user's intent or supplied ratio into pixel dimensions before sending.
+5. **Constraints block is mandatory.** A dedicated paragraph of what NOT to do — typically as long as the subject section. The most underused part of an image prompt.
+6. **Generate fresh, do not edit.** Image-to-image is still unreliable. If the user pastes a reference image, extract its qualities into words and regenerate from text only. Tell Codex explicitly to generate fresh, not to use a previous image as a starting point.
+
+## Crafting checklist
+
+Build the inner image prompt in this exact order. Every section is mandatory unless flagged optional.
+
+1. **Style + intended use.** Open with the medium and aesthetic. For photoreal work, name the camera, lens, film stock, and lighting condition — specificity is realism.
+2. **Scene.** Where, when, lighting, mood, weather, time of day. One paragraph.
+3. **Subject.** The focal point. Pose, action, expression, materials. For people, lock in consistent traits (hair, build, age, distinguishing features).
+4. **Details.** Background, props, micro-details. For photoreal work, include a believable-imperfections list (a stray seed, a juice bead on a thumbnail, a paper-cut on the index finger). Imperfection is the difference between AI-photo and editorial-photo.
+5. **Quoted text.** Every literal string in the image, in double quotes, with exact punctuation, spacing, and casing. Be obsessive — `"Noon & Co."` not `Noon and Co`.
+6. **Constraints.** A dedicated block of what NOT to do. Typical entries: no drop shadows, no fake bokeh, no glare, no lens flare; no emoji, no SF Symbols, no Apple defaults; five fingers per hand, correct knuckle spacing, no fused anatomy; two type families only — name them; no QR codes, no URLs, no hashtags; no additional text beyond what is quoted.
+7. **Output dimensions.** Final line, always. Format: `Output in exactly [W]px x [H]px ([ratio]) [orientation].`
+
+## Output dimension defaults
+
+When the user does not provide dimensions, infer from intent:
+
+| Intent signal | Pixel dimensions | Ratio | Orientation |
+|---|---|---|---|
+| Generic / ad / hero | `1536px x 1024px` | 3:2 | landscape |
+| Square social card | `1024px x 1024px` | 1:1 | square |
+| Wide social card | `1792px x 1024px` | 7:4 | landscape |
+| Portrait phone screen | `1024px x 1792px` | 4:7 | portrait |
+| Magazine cover | `1024px x 1280px` | 4:5 | portrait |
+| Presentation slide | `1536px x 1024px` | 3:2 | landscape |
+| App icon | `1024px x 1024px` | 1:1 | square |
+
+State the targeted dimensions inside the prompt body itself. Codex's image tool reads the prompt and sizes accordingly.
+
+## Wrapping for Codex
+
+The drafted image prompt is the inner content. The subagent wraps it in a `<task>` block (per the `gpt-5-4-prompting` skill) instructing Codex to:
+
+- Use its native image generation tool.
+- Pass the inner `<image_prompt>` verbatim — no paraphrasing, no shortening, no "improvement."
+- Save the resulting PNG and print the absolute saved path on the last line of stdout.
+- If the slash command supplied `--out <path>`, also copy the saved PNG to that absolute path (creating the directory if needed) and print that path on the last line instead.
+- Generate fresh — do not use any prior image as a reference or seed.
+
+Codex's image tool handles the API call, file save, and path reporting. The subagent does not write or run any image-generation code itself.
+
+## What you are NOT doing
+
+- Not writing a script that calls an external image API. Codex's native tool handles it.
+- Not running discovery interviews. The slash command may have asked once. The subagent commits to a craft-grade prompt from whatever intent it received.
+- Not summarizing the prompt back. The subagent's only output is Codex's stdout.
+- Not editing the prompt after Codex returns. The prompt is the artifact.
+- Not chaining into other commands. This skill scopes a single forwarded `task` call.
diff --git a/tests/commands.test.mjs b/tests/commands.test.mjs
@@ -75,6 +75,7 @@ test("continue is not exposed as a user-facing command", () => {
   assert.deepEqual(commandFiles, [
     "adversarial-review.md",
     "cancel.md",
+    "image.md",
     "rescue.md",
     "result.md",
     "review.md",
@@ -83,6 +84,55 @@ test("continue is not exposed as a user-facing command", () => {
   ]);
 });
 
+test("image command forwards to codex-image subagent and pins inline Agent transport", () => {
+  const image = read("commands/image.md");
+  const agent = read("agents/codex-image.md");
+  const skill = read("skills/image/SKILL.md");
+
+  assert.match(image, /The final user-visible response must be Codex's output verbatim/i);
+  assert.match(image, /allowed-tools:\s*Bash\(node:\*\),\s*AskUserQuestion,\s*Agent/);
+  assert.match(image, /subagent_type: "codex:codex-image"/);
+  assert.match(image, /do not call `Skill\(codex:codex-image\)`/i);
+  assert.match(image, /do not call .* `Skill\(codex:image\)`/i);
+  assert.doesNotMatch(image, /^context:\s*fork\b/m);
+  assert.match(image, /--background\|--wait/);
+  assert.match(image, /--model <model\|spark>/);
+  assert.match(image, /--out <path>/);
+  assert.match(image, /default to foreground/i);
+  assert.match(image, /Do not forward them to `task`/i);
+  assert.match(image, /Always pass `--write`/i);
+  assert.match(image, /If they ask for `spark`, map it to `gpt-5\.3-codex-spark`/i);
+  assert.match(image, /thin forwarder only/i);
+  assert.match(image, /Return the Codex companion stdout verbatim to the user/i);
+  assert.match(image, /If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `\/codex:setup`/i);
+
+  assert.match(agent, /name:\s*codex-image/);
+  assert.match(agent, /tools:\s*Bash/);
+  assert.match(agent, /codex-cli-runtime/);
+  assert.match(agent, /gpt-5-4-prompting/);
+  assert.match(agent, /^\s*-\s*image\s*$/m);
+  assert.match(agent, /thin forwarding wrapper/i);
+  assert.match(agent, /Use exactly one `Bash` call/i);
+  assert.match(agent, /Always pass `--write`/i);
+  assert.match(agent, /Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own/i);
+  assert.match(agent, /Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`/i);
+  assert.match(agent, /native image generation tool/i);
+  assert.match(agent, /<image_prompt>/);
+  assert.match(agent, /If the Bash call fails or Codex cannot be invoked, return nothing/i);
+
+  assert.match(skill, /name:\s*image/);
+  assert.match(skill, /user-invocable:\s*false/);
+  assert.match(skill, /Use this skill only inside the `codex:codex-image` subagent/);
+  assert.match(skill, /Lead with style and intended use/i);
+  assert.match(skill, /Quote every literal string/i);
+  assert.match(skill, /Aspect ratio = explicit pixel dimensions/i);
+  assert.match(skill, /Constraints block is mandatory/i);
+  assert.match(skill, /Generate fresh, do not edit/i);
+  assert.match(skill, /Output in exactly \[W\]px x \[H\]px/);
+  assert.match(skill, /native image generation tool/i);
+  assert.match(skill, /<image_prompt>/);
+});
+
 test("rescue command absorbs continue semantics", () => {
   const rescue = read("commands/rescue.md");
   const agent = read("agents/codex-rescue.md");