Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 13 additions & 22 deletions docs/specs/MEDIA.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,9 @@ This is a spec/notes document only. It does not imply that inbound media support

- Codex app-server already supports multimodal turn input via `UserInput`.
- The supported image-shaped input items are remote/data URL images and local filesystem images.
- This plugin now supports mixed text + image turn input and forwards inbound image media into Codex when OpenClaw provides a staged media path or URL.
- This plugin currently sends text-only turn input to Codex.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changes here at all? Just bad base branch?

- OpenClaw’s plugin SDK already supports outbound attachments from a plugin via `mediaUrl` and `mediaUrls`.
- OpenClaw’s plugin SDK still does not model inbound attachments as a first-class typed field on command or `inbound_claim` events.
- In practice, current `inbound_claim` hook metadata already carries `mediaPath` / `mediaType`, which is enough for this plugin to forward a staged inbound image.
- OpenClaw’s plugin SDK does not currently expose inbound attachments or image files to plugin commands or `inbound_claim` hooks.
- The cleanest future bridge is: OpenClaw stages inbound files locally, then this plugin maps image paths to Codex `localImage` items.

## Codex App-Server Input Model
Expand Down Expand Up @@ -158,27 +157,22 @@ Or, if only a URL/data URL is available:

## Current State In This Plugin

This plugin now builds multimodal turn input when image media is available:
Today this plugin builds text-only turn input:

Source:
- [`src/client.ts`](../../src/client.ts)

```ts
function buildTurnInput(prompt: string, input?: readonly CodexTurnInputItem[]) {
if (input?.length) {
return input.map((item) => ({ ...item }));
}
function buildTurnInput(prompt: string): Array<Record<string, unknown>> {
return [{ type: "text", text: prompt }];
}
```

That means:

- text-only turns still work as before
- mixed text + image turns can be forwarded into Codex
- image-only inbound turns can be forwarded into Codex
- staged text attachments such as `.txt`, `.md`, `.json`, `.yaml`, and `.yml` can be read and forwarded as additional `text` items
- unsupported binary non-image inbound media is still ignored for now
- even though Codex app-server supports images
- and even though OpenClaw can handle attachments elsewhere
- this plugin currently does not forward inbound JPEG/PNG/etc. into Codex

## OpenClaw Plugin SDK: Outbound Media

Expand Down Expand Up @@ -283,9 +277,8 @@ export type PluginHookInboundClaimEvent = {
So, from the plugin’s point of view today:

- outbound attachments are supported
- inbound attachments are still not modeled as first-class typed plugin input
- `inbound_claim` metadata does already carry `mediaPath` / `mediaType`, so the plugin can use that best-effort bridge for inbound image forwarding
- command handlers still cannot rely on a first-class structured image field from OpenClaw
- inbound attachments are not modeled as first-class plugin input
- a command handler cannot currently receive a JPEG as a structured image input from OpenClaw

## OpenClaw Gateway Already Has Attachment Logic

Expand Down Expand Up @@ -414,12 +407,10 @@ Within this repository, future media support would require at least:
- local image path -> `localImage`
- remote/data URL image -> `image`
- mixed text + image turn input
- text attachments read and forwarded as `text`
- unsupported binary attachments ignored or downgraded to text references
- non-image attachments ignored or downgraded to text references

The remaining practical boundary is:
Until then, the practical answer is:

- Codex app-server already supports images plus ordinary text items
- Codex app-server already supports images
- OpenClaw already supports outbound attachments from plugins
- this plugin can now turn staged inbound images into Codex image input and staged inbound text files into Codex text input
- richer binary formats such as PDF, audio, and video still need preprocessing before they can be meaningfully sent to Codex
- but this plugin cannot yet accept inbound JPEG/PNG/etc. from OpenClaw as Codex turn input because the current plugin boundary does not expose those attachments
2 changes: 1 addition & 1 deletion index.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ describe("plugin registration", () => {
expect(api.registerInteractiveHandler).toHaveBeenCalledTimes(2);
expect(api.registerCommand).toHaveBeenCalled();
expect(api.registerCommand.mock.calls.map(([params]) => params.name)).toEqual(
COMMANDS.map(([name]) => name),
[...COMMANDS.map(([name]) => name), "cas_click"],
);
});

Expand Down
25 changes: 25 additions & 0 deletions index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ const plugin = {
description: "Independent OpenClaw plugin for the Codex App Server protocol.",
register(api: OpenClawPluginApi) {
const controller = new CodexPluginController(api);
const hookApi = api as OpenClawPluginApi & {
on?: (
hookName: string,
handler: (event: Record<string, unknown>, ctx?: Record<string, unknown>) => Promise<unknown> | unknown,
) => void;
};

api.registerService(controller.createService());

Expand All @@ -26,6 +32,13 @@ const plugin = {
return await controller.handleInboundClaim(event);
});

hookApi.on?.("before_dispatch", async (event, ctx) => {
return await controller.handleBeforeDispatch(event, ctx);
});
(api as OpenClawPluginApi & { logger?: { warn?: (text: string) => void } }).logger?.warn?.(
"codex plugin registered before_dispatch hook",
);

api.registerInteractiveHandler({
channel: "telegram",
namespace: INTERACTIVE_NAMESPACE,
Expand Down Expand Up @@ -54,6 +67,18 @@ const plugin = {
},
});
}

// Internal Feishu card callback command.
// This must be registered so `/cas_click <token>` is routed to command handling
// instead of falling through to a normal LLM turn.
api.registerCommand({
name: "cas_click",
description: "Internal command for Feishu card callbacks.",
acceptsArgs: true,
handler: async (ctx) => {
return await controller.handleCommand("cas_click", ctx);
},
});
},
};

Expand Down
25 changes: 12 additions & 13 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
{
"name": "openclaw-codex-app-server",
"version": "0.0.0",
"version": "0.5.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's not how we handle versioning... we leave the version as 0.0.0 in the package.json so it's clear we're running an unreleased version.

"description": "Independent OpenClaw plugin for the Codex App Server protocol",
"author": "PwrDrvr LLC",
"license": "MIT",
"packageManager": "pnpm@10.29.3",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be deleted?

"type": "module",
"openclaw": {
"extensions": [
Expand All @@ -19,16 +18,6 @@
"pluginSdkVersion": "2026.3.22"
}
},
"scripts": {
"project:sync": "node ./.agents/skills/project-manager/scripts/sync-work-items.mjs",
"test": "vitest run",
"typecheck": "tsc --noEmit",
"smoke:app-server-permissions": "node ./scripts/app-server-permissions-smoke.mjs",
"smoke:app-server-thread-permissions": "node ./scripts/app-server-thread-permissions-smoke.mjs",
"pack:smoke": "node ./scripts/pack-smoke.mjs",
"release:metadata": "node ./scripts/release-metadata.mjs",
"release:apply-version": "node ./scripts/apply-release-version.mjs"
},
"peerDependencies": {
"openclaw": ">=2026.3.22"
},
Expand All @@ -40,5 +29,15 @@
"typescript": "^5.9.2",
"vitest": "^3.2.4",
"yaml": "^2.8.2"
},
"scripts": {
"project:sync": "node ./.agents/skills/project-manager/scripts/sync-work-items.mjs",
"test": "vitest run",
"typecheck": "tsc --noEmit",
"smoke:app-server-permissions": "node ./scripts/app-server-permissions-smoke.mjs",
"smoke:app-server-thread-permissions": "node ./scripts/app-server-thread-permissions-smoke.mjs",
"pack:smoke": "node ./scripts/pack-smoke.mjs",
"release:metadata": "node ./scripts/release-metadata.mjs",
"release:apply-version": "node ./scripts/apply-release-version.mjs"
Comment thread
eeelvn-bot marked this conversation as resolved.
}
}
}
Loading