Conversation
Bridge to open-computer-use/SkyComputerUseClient binary over stdio JSON-RPC. Adds 10 MCP tools (cu_list_apps, cu_get_app_state, cu_click, etc.) with lazy init, per-tool timeouts, image passthrough, and macOS platform guard. 97 tests passing, lint clean. Co-authored-by: Claude <noreply@anthropic.com> Entire-Checkpoint: 8f81eabf57ca
Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com> Entire-Checkpoint: de373f85600c
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 40 minutes and 11 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
WalkthroughAdds a Playwright-backed browser automation tool ("browser") to MCP: new browser-use module (bridge, client, handlers, types, definitions), integrates the tool into tool registry and handlers, updates types to support image results, adds graceful shutdown for browser sessions, tests and documentation, and marks Playwright as a peer dependency. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/User
participant Server as MCP Server
participant Handler as Browser Tool Handler
participant Bridge as BrowserUseBridge
participant PWClient as Playwright Client
participant Browser as Chromium Page
Client->>Server: invoke tool "browser" with action
Server->>Handler: dispatch to browserUseHandler.execute(args)
Handler->>Handler: parse/validate action (Zod)
Handler->>Bridge: call bridge.<action>(...)
Bridge->>PWClient: lazy import & call client function
PWClient->>Browser: perform page operation (navigate/click/screenshot)
Browser-->>PWClient: return result
PWClient-->>Bridge: return structured result (image/text)
Bridge-->>Handler: return result
Handler-->>Server: wrap as ToolResult and return
Server-->>Client: respond with ToolResult
alt Shutdown (SIGINT/SIGTERM)
Server->>Handler: shutdownBrowserSessions()
Handler->>Bridge: bridge.shutdown()
Bridge->>PWClient: close all browser sessions
PWClient-->>Bridge: closed
Bridge-->>Handler: done
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com> Entire-Checkpoint: d5b3202aaa87
There was a problem hiding this comment.
Actionable comments posted: 14
🧹 Nitpick comments (6)
src/computer-use/__tests__/discovery.test.ts (1)
4-6: Hard-coded absolute path is fragile but acceptable as a probe.The bundled binary path is encoded twice (here and presumably in
src/computer-use/discovery.ts). If the Codex.app layout changes, this skip-guard silently passes by skipping. Consider importing a constant fromdiscovery.ts(e.g.,CODEX_APP_BINARY_PATH) instead of duplicating the literal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/computer-use/__tests__/discovery.test.ts` around lines 4 - 6, The test currently duplicates the hard-coded Codex binary path literal when setting hasCodexBinary; import and use the single source of truth from discovery.ts (e.g., CODEX_APP_BINARY_PATH) instead of repeating the string so the probe stays in sync with discovery logic — replace the literal used by hasCodexBinary with the imported CODEX_APP_BINARY_PATH constant and remove the duplicated path literal in the test.src/computer-use/__tests__/handlers.test.ts (1)
33-49: Consider asserting onValidationErrortype, not justtoThrow().Bare
.rejects.toThrow()will pass for any thrown error, including unrelated failures (e.g., bridge init, missing binary on non-macOS CI). Persrc/computer-use/handlers.tslines 59-65,ZodErroris converted toValidationError— assert on that to ensure validation actually short-circuits before any bridge call.♻️ Proposed refactor
+import { ValidationError } from '../../errors.js'; @@ - await expect( - handler.execute(CU_TOOLS.CLICK, {}) - ).rejects.toThrow(); + await expect( + handler.execute(CU_TOOLS.CLICK, {}) + ).rejects.toThrow(ValidationError);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/computer-use/__tests__/handlers.test.ts` around lines 33 - 49, The tests currently use bare `.rejects.toThrow()` which will catch any error; update each failing test that calls handler.execute (for CU_TOOLS.CLICK, CU_TOOLS.GET_APP_STATE, CU_TOOLS.DRAG) to assert the promise rejects with the specific ValidationError type that handlers.ts maps ZodError to (i.e., use `.rejects.toThrowError(ValidationError)` or the equivalent matcher for ValidationError) so the spec verifies input validation short-circuits before any bridge/init errors.eslint.config.mjs (1)
49-58: Consider mirroring Node globals in the test config block.The test files override (lines 49-58) doesn't include
Buffer,setTimeout,clearTimeout,setInterval,clearInterval, orsetImmediate. If any of the newsrc/computer-use/__tests__/*.test.tsfiles (or future tests) reference those Node globals, lint will fail withno-undef. Consider duplicating the Node globals here too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eslint.config.mjs` around lines 49 - 58, The test config's globals object (the block that currently lists describe, test, expect, beforeEach, afterEach, beforeAll, afterAll, jest) is missing common Node globals and should be updated to mirror the Node globals used elsewhere; add Buffer, setTimeout, clearTimeout, setInterval, clearInterval, and setImmediate to that globals object so test files (e.g., src/computer-use/__tests__/*.test.ts) won't trigger no-undef lint errors.README.md (1)
104-104: Verify theelement_indexexample value type.The README example passes
element_index "42"(a quoted string). Perdocs/api-reference.mdline 488 and theClickSchema(element_index: z.string().optional()), this is correct, but it can read awkwardly to users who'd intuitively pass an integer. Consider noting that element indices come fromcu_get_app_stateoutput as strings, or otherwise no change required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 104, The README example uses a quoted "42" for element_index which is correct per ClickSchema (element_index: z.string().optional()), but is confusing; update the cu_click example to clarify that element_index is a string (it comes from the cu_get_app_state output) — either add a short parenthetical note next to element_index in the example stating "element_index must be a string (as returned by cu_get_app_state)" or show both forms (string example and a note) so users understand why it's quoted; reference the ClickSchema and cu_get_app_state when adding the clarification.src/computer-use/client.ts (2)
28-31: Fullprocess.envis forwarded to the subprocess.The entire parent environment (including any secrets like API keys) is passed to the external binary. Since the binary only needs macOS accessibility, consider forwarding a minimal allow-list (
PATH,HOME,USER,LANG, display-related vars) to reduce blast radius if the binary ever logs or leaks env.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/computer-use/client.ts` around lines 28 - 31, The child process is currently started with env: { ...process.env } which forwards all environment variables (including secrets); update the spawn call that creates this.process (the line using spawn(this.binary.path, this.binary.args, ...)) to pass a minimal allow-list instead of the whole env: build an env object that picks only required keys (e.g., PATH, HOME, USER, LANG, TERM and any macOS/display-related vars your binary needs) from process.env and use that object for the env option; ensure any required runtime variables for accessibility checks are included and fall back to safe defaults if missing.
112-117:disconnect()relies on default SIGTERM with no escalation.
process?.kill()sends SIGTERM and returns immediately; a misbehaving binary may never exit. Consider a short SIGTERM → SIGKILL escalation timer so the bridge can reliably reset state when a tool hangs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/computer-use/client.ts` around lines 112 - 117, The disconnect() method currently calls this.process?.kill() (default SIGTERM) and returns immediately; update it to perform a SIGTERM → SIGKILL escalation: if this.process exists, call this.process.kill('SIGTERM'), attach a one-time exit listener (process.once('exit')) to clear a fallback timer, start a short timeout (e.g., 500–1000ms) that checks if the process is still running and, if so, calls this.process.kill('SIGKILL'), then regardless of exit path ensure you clear the timer, set this.process = null, this.connected = false, and call this.rejectAll(new Error('Client disconnected')); also guard against double-kill by checking process.pid or that this.process is still the same object before sending SIGKILL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/commands/codex-setup.md:
- Line 7: The current line in codex-setup.md asserts "This is **not** a security
risk — it only affects your local machine." — change this to a balanced tradeoff
statement: explain that ad-hoc re-signing removes vendor signing and
hardened-runtime protections for the local Codex.app binary and can increase the
risk on that machine (e.g., allowing other processes or unsigned code to
interact with the app), while noting the scope is local; update the sentence to
describe that this is a local security tradeoff rather than claiming it is
risk-free.
- Line 68: The smoke-test command uses the GNU `timeout` binary which is absent
on stock macOS; update the command that pipes the initialize JSON into "$BINARY"
mcp to use `gtimeout` when available (fall back to `timeout` otherwise) or add a
note documenting the dependency; specifically modify the invocation that
currently calls `timeout 5 "$BINARY" mcp` (reference the "$BINARY" and `mcp`
usage) to prefer `gtimeout` on macOS so the command succeeds on macOS CI/dev
machines.
In `@docs/computer-use.md`:
- Around line 7-14: The fenced code block containing the ASCII diagram starting
with "Claude Code → codex-mcp-server" should include a language specifier (e.g.,
text) to satisfy markdownlint MD040; update that triple-backtick fence to use
```text so the block is identified as plain text (the ASCII diagram lines
"Claude Code", "codex-mcp-server", "ComputerUseBridge", "SkyComputerUseClient",
"stdio JSON-RPC", "macOS Accessibility APIs" remain unchanged).
- Line 20: Update the two occurrences of the Codex.app download URL in
docs/computer-use.md: replace the link target "https://codex.ai" with the
official OpenAI page "https://developers.openai.com/codex/app" (the visible text
can remain "Codex.app"), ensuring both instances mentioned in the comment (the
line referencing the SkyComputerUseClient binary and the later occurrence around
line 161) are changed so the `/Applications/Codex.app` detection note still
reads correctly and the link points to the official OpenAI download.
In `@README.md`:
- Around line 100-105: Add a language hint to the fenced code block in the
"Computer Use (macOS)" example in README.md to satisfy markdownlint MD040;
specifically update the triple-backtick fence that encloses the commands for
cu_status, cu_get_app_state and cu_click to include a language token such as
text (e.g., ```text) so the block is labeled consistently with the other
examples.
In `@src/computer-use/__tests__/types.test.ts`:
- Around line 38-45: The test iterates only mapped values so it won't catch
missing keys in CU_TO_BINARY; update the test in types.test.ts to assert that
the set of keys of CU_TO_BINARY equals the set of CU_TOOLS excluding
CU_TOOLS.STATUS (instead of only checking values), while still verifying each
mapped value exists in BINARY_TOOLS; use the unique symbols CU_TO_BINARY,
CU_TOOLS, and BINARY_TOOLS to locate and replace the current loop-based
assertions with a keys-based comparison plus the existing value containment
checks.
In `@src/computer-use/bridge.ts`:
- Around line 87-91: The shutdown() method currently disconnects and clears
this.client and this.initialized but leaves cu_status fields (binaryInfo and
error) populated, causing stale diagnostics; update shutdown() to also
clear/reset cu_status.binaryInfo and cu_status.error (e.g., set them to
undefined/null or their default empty state) so that after calling shutdown()
both the client and the cu_status state are fully cleaned up.
In `@src/computer-use/client.ts`:
- Around line 137-138: sendNotification/sendRequest currently do
this.process?.stdin?.write(msg) which silently drops messages if the child
process or its stdin is gone; change both sendNotification and sendRequest to
explicitly check that this.process and this.process.stdin exist and
this.process.stdin.writable (or !destroyed) before attempting to write, and if
not available immediately reject/return an error (for sendRequest reject the
Promise, for sendNotification return/fail fast). When writing, use the write()
return value and the write callback to surface write errors (invoke the
sendRequest reject or surface a notification failure), and ensure you handle the
case where write returns false (buffering) by listening for 'drain' or treating
it as backpressure error. Update the code paths referenced by the snippets (the
methods that construct msg and call this.process?.stdin?.write at the shown
locations) to implement these checks and callback-based error handling.
- Around line 62-64: The current check throws when initResult.result is falsy
which will incorrectly fail on legitimate null/empty results; change the success
condition to check for the presence of initResult.error instead. Update the
block that currently inspects initResult.result (the variables named initResult,
result, error) to throw only when initResult.error is non-null/defined and
otherwise proceed (i.e., treat a null/empty result as success).
- Around line 25-69: The connect() method can be called concurrently and
currently spawns multiple subprocesses; introduce and use an in-flight promise
(e.g. this.connectPromise) inside connect() so that if connect() is called while
a connection is being established callers await the same promise instead of
spawning another process; create the child process and attach handlers as you do
now, set this.connectPromise before awaiting sendRequest('initialize', ...), on
success set this.connected = true and clear this.connectPromise, and on any
error or process 'close' ensure you clear this.connectPromise, clean up/kill the
spawned process if needed and call rejectAll so state remains consistent (use
existing symbols connect(), this.connected, this.process, sendRequest,
rejectAll).
In `@src/computer-use/definitions.ts`:
- Around line 45-78: The JSON inputSchema for CU_TOOLS.CLICK currently only
requires 'app', allowing no click target; change the schema to require either
element_index OR both x and y by replacing the flat required with a oneOf clause
(one option: { required: ['app','element_index'] } and another: { required:
['app','x','y'] }) and update the runtime Zod validation used for CU_TOOLS.CLICK
to mirror this logic (validate presence of element_index OR both x and y and
preserve types for mouse_button and click_count) so the advertised schema and
runtime checks are consistent.
In `@src/computer-use/handlers.ts`:
- Around line 68-82: The status handler currently returns null for binary and
error when ComputerUseBridge.getBinaryInfo() and getError() are empty; update
handleStatus to perform a side-effect-free discovery probe via the bridge before
returning: call a discovery method on the ComputerUseBridge instance (e.g., a
new or existing probe/discover method on ComputerUseBridge) to attempt to locate
the binary and populate binaryInfo/error without starting the subprocess, then
use those probe results (falling back to the existing getBinaryInfo()/getError()
if present) when building the status object; ensure you reference
ComputerUseBridge.getInstance(), ComputerUseBridge.getBinaryInfo(),
ComputerUseBridge.getError(), and ComputerUseBridge.isReady() while avoiding
launching the real tool process.
- Around line 35-39: The code currently throws a raw Error when an unknown tool
name is encountered (CU_TO_BINARY lookup for toolName); replace the raw throw
with throwing the standardized ValidationError (throw new
ValidationError(`Unknown computer-use tool: ${toolName}`)) so the error follows
the project's error path, and add or update the import of ValidationError at the
top of the file if not already imported; keep the same message text to preserve
semantics.
In `@src/types.ts`:
- Around line 78-85: The ToolResult.content item should be a discriminated union
instead of a single shape that forces text for images; replace the current
inline object with two interfaces (e.g., TextContent { type: 'text'; text:
string; _meta?: Record<string, unknown> } and ImageContent { type: 'image';
data: string; mimeType: string; _meta?: Record<string, unknown> }) and change
ToolResult to content: Array<TextContent | ImageContent>; then update call sites
that assume item.text (for example the code in client.ts that uses item.text ||
'') to handle the two variants by checking item.type === 'text' before reading
text or treating images via data/mimeType.
---
Nitpick comments:
In `@eslint.config.mjs`:
- Around line 49-58: The test config's globals object (the block that currently
lists describe, test, expect, beforeEach, afterEach, beforeAll, afterAll, jest)
is missing common Node globals and should be updated to mirror the Node globals
used elsewhere; add Buffer, setTimeout, clearTimeout, setInterval,
clearInterval, and setImmediate to that globals object so test files (e.g.,
src/computer-use/__tests__/*.test.ts) won't trigger no-undef lint errors.
In `@README.md`:
- Line 104: The README example uses a quoted "42" for element_index which is
correct per ClickSchema (element_index: z.string().optional()), but is
confusing; update the cu_click example to clarify that element_index is a string
(it comes from the cu_get_app_state output) — either add a short parenthetical
note next to element_index in the example stating "element_index must be a
string (as returned by cu_get_app_state)" or show both forms (string example and
a note) so users understand why it's quoted; reference the ClickSchema and
cu_get_app_state when adding the clarification.
In `@src/computer-use/__tests__/discovery.test.ts`:
- Around line 4-6: The test currently duplicates the hard-coded Codex binary
path literal when setting hasCodexBinary; import and use the single source of
truth from discovery.ts (e.g., CODEX_APP_BINARY_PATH) instead of repeating the
string so the probe stays in sync with discovery logic — replace the literal
used by hasCodexBinary with the imported CODEX_APP_BINARY_PATH constant and
remove the duplicated path literal in the test.
In `@src/computer-use/__tests__/handlers.test.ts`:
- Around line 33-49: The tests currently use bare `.rejects.toThrow()` which
will catch any error; update each failing test that calls handler.execute (for
CU_TOOLS.CLICK, CU_TOOLS.GET_APP_STATE, CU_TOOLS.DRAG) to assert the promise
rejects with the specific ValidationError type that handlers.ts maps ZodError to
(i.e., use `.rejects.toThrowError(ValidationError)` or the equivalent matcher
for ValidationError) so the spec verifies input validation short-circuits before
any bridge/init errors.
In `@src/computer-use/client.ts`:
- Around line 28-31: The child process is currently started with env: {
...process.env } which forwards all environment variables (including secrets);
update the spawn call that creates this.process (the line using
spawn(this.binary.path, this.binary.args, ...)) to pass a minimal allow-list
instead of the whole env: build an env object that picks only required keys
(e.g., PATH, HOME, USER, LANG, TERM and any macOS/display-related vars your
binary needs) from process.env and use that object for the env option; ensure
any required runtime variables for accessibility checks are included and fall
back to safe defaults if missing.
- Around line 112-117: The disconnect() method currently calls
this.process?.kill() (default SIGTERM) and returns immediately; update it to
perform a SIGTERM → SIGKILL escalation: if this.process exists, call
this.process.kill('SIGTERM'), attach a one-time exit listener
(process.once('exit')) to clear a fallback timer, start a short timeout (e.g.,
500–1000ms) that checks if the process is still running and, if so, calls
this.process.kill('SIGKILL'), then regardless of exit path ensure you clear the
timer, set this.process = null, this.connected = false, and call
this.rejectAll(new Error('Client disconnected')); also guard against double-kill
by checking process.pid or that this.process is still the same object before
sending SIGKILL.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5b2d4254-1848-4ed5-986b-4290585bab4b
📒 Files selected for processing (19)
.claude/commands/codex-setup.md.gitignoreREADME.mddocs/api-reference.mddocs/computer-use.mdeslint.config.mjssrc/__tests__/index.test.tssrc/computer-use/__tests__/discovery.test.tssrc/computer-use/__tests__/handlers.test.tssrc/computer-use/__tests__/types.test.tssrc/computer-use/bridge.tssrc/computer-use/client.tssrc/computer-use/definitions.tssrc/computer-use/discovery.tssrc/computer-use/handlers.tssrc/computer-use/types.tssrc/tools/definitions.tssrc/tools/handlers.tssrc/types.ts
|
|
||
| ```bash | ||
| BINARY="/Applications/Codex.app/Contents/Resources/plugins/openai-bundled/plugins/computer-use/Codex Computer Use.app/Contents/SharedSupport/SkyComputerUseClient.app/Contents/MacOS/SkyComputerUseClient" | ||
| echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | timeout 5 "$BINARY" mcp 2>/dev/null | head -1 |
There was a problem hiding this comment.
Use a macOS-compatible timeout in the smoke test.
timeout is not installed on stock macOS, so this setup command can fail before testing the binary. Use gtimeout when available or document the dependency.
📝 Proposed command fallback
-echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | timeout 5 "$BINARY" mcp 2>/dev/null | head -1
+TIMEOUT_BIN="$(command -v timeout || command -v gtimeout || true)"
+if [ -z "$TIMEOUT_BIN" ]; then
+ echo "Install coreutils for timeout support: brew install coreutils"
+else
+ echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | "$TIMEOUT_BIN" 5 "$BINARY" mcp 2>/dev/null | head -1
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | timeout 5 "$BINARY" mcp 2>/dev/null | head -1 | |
| TIMEOUT_BIN="$(command -v timeout || command -v gtimeout || true)" | |
| if [ -z "$TIMEOUT_BIN" ]; then | |
| echo "Install coreutils for timeout support: brew install coreutils" | |
| else | |
| echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | "$TIMEOUT_BIN" 5 "$BINARY" mcp 2>/dev/null | head -1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/codex-setup.md at line 68, The smoke-test command uses the
GNU `timeout` binary which is absent on stock macOS; update the command that
pipes the initialize JSON into "$BINARY" mcp to use `gtimeout` when available
(fall back to `timeout` otherwise) or add a note documenting the dependency;
specifically modify the invocation that currently calls `timeout 5 "$BINARY"
mcp` (reference the "$BINARY" and `mcp` usage) to prefer `gtimeout` on macOS so
the command succeeds on macOS CI/dev machines.
| ``` | ||
| Claude Code | ||
| → codex-mcp-server | ||
| → ComputerUseBridge (singleton, lazy init) | ||
| → SkyComputerUseClient (Codex.app subprocess) | ||
| → stdio JSON-RPC (newline-delimited) | ||
| → macOS Accessibility APIs | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block.
markdownlint MD040 flags this block. Use text (or similar) since the content is an ASCII diagram.
✏️ Proposed fix
-```
+```text
Claude Code
→ codex-mcp-server📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| Claude Code | |
| → codex-mcp-server | |
| → ComputerUseBridge (singleton, lazy init) | |
| → SkyComputerUseClient (Codex.app subprocess) | |
| → stdio JSON-RPC (newline-delimited) | |
| → macOS Accessibility APIs | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/computer-use.md` around lines 7 - 14, The fenced code block containing
the ASCII diagram starting with "Claude Code → codex-mcp-server" should include
a language specifier (e.g., text) to satisfy markdownlint MD040; update that
triple-backtick fence to use ```text so the block is identified as plain text
(the ASCII diagram lines "Claude Code", "codex-mcp-server", "ComputerUseBridge",
"SkyComputerUseClient", "stdio JSON-RPC", "macOS Accessibility APIs" remain
unchanged).
|
|
||
| ## Setup | ||
|
|
||
| Requires [Codex.app](https://codex.ai) installed. The `SkyComputerUseClient` binary is detected automatically from `/Applications/Codex.app`. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file and surrounding context
fd docs/computer-use.mdRepository: tuannvm/codex-mcp-server
Length of output: 50
🏁 Script executed:
# Read the file to see lines around 20 and 161
cat -n docs/computer-use.md | sed -n '15,25p'Repository: tuannvm/codex-mcp-server
Length of output: 589
🏁 Script executed:
# Also check line 161 context
cat -n docs/computer-use.md | sed -n '156,166p'Repository: tuannvm/codex-mcp-server
Length of output: 585
🏁 Script executed:
# Search for SkyComputerUseClient in the codebase
rg -i "SkyComputerUseClient" --type md --type py --type js --type tsRepository: tuannvm/codex-mcp-server
Length of output: 1395
🌐 Web query:
Codex.app SkyComputerUseClient binary
💡 Result:
No high-confidence evidence found for a "SkyComputerUseClient binary" associated with Codex.app (OpenAI's coding agent desktop application). Searches across general web, OpenAI docs, GitHub issues, and company profiles returned extensive details on OpenAI Codex CLI/app binaries (e.g., codex.exe on Windows, bundled in DMG on macOS), computer-use MCP servers (computer-use-mcp), and app-server clients, but nothing matching "SkyComputerUseClient". It may be an internal/unreleased component, misremembered name (e.g., related to "computer-use" tools), or non-existent. Codex binaries are downloadable from GitHub releases or app stores; computer-use is a plugin/MCP feature requiring separate Node.js setup.
Citations:
- 1: Windows Codex App still fails to load native
computer-use-mcptool (invalid type: map, expected variant identifier) openai/codex#16574 - 2: https://developers.openai.com/codex/app
- 3: https://www.reddit.com/r/codex/comments/1snapwm/a_major_update_has_been_released_for_the_codex/
- 4: https://developers.openai.com/codex/app/computer-use
- 5: https://github.com/openai/codex/tree/main
- 6: https://apps.microsoft.com/detail/9plm9xgg6vks?gl=US&hl=en-US
- 7: https://skywork.ai/blog/how-to-use-openai-codex-cli
- 8: https://linkedin.com/company/codexapps
- 9: https://platform.openai.com/docs/models/computer-use-preview
- 10: https://developers.openai.com/docs/guides/tools-computer-use
🌐 Web query:
OpenAI Codex desktop app download
💡 Result:
The OpenAI Codex desktop app is available for download on Windows and macOS. It is the official command center for agentic software development using Codex, powered by ChatGPT. For Windows: - Download from the Microsoft Store: https://apps.microsoft.com/detail/9plm9xgg6vks?gl=US&hl=en-US - Or via command line: winget install Codex -s msstore For macOS: - Download from official OpenAI developer docs (direct link referenced but not inline; visit https://developers.openai.com/codex/app or https://developers.openai.com/codex/quickstart?setup=app). Choose Apple Silicon or Intel build as appropriate. Sign in with a ChatGPT account (Plus, Pro, etc.) or API key after installation. Linux support is upcoming (get notified). Note: Original OpenAI Codex (2021 code model) is deprecated; this refers to the 2025+ Codex coding agent product.
Citations:
- 1: https://developers.openai.com/codex/app/windows
- 2: https://apps.microsoft.com/detail/9plm9xgg6vks?gl=US&hl=en-US
- 3: https://developers.openai.com/codex/quickstart?setup=app
- 4: https://developers.openai.com/codex/app
Replace the https://codex.ai URL with the official OpenAI download location.
The link https://codex.ai is not the official distribution URL for Codex.app. Use the official OpenAI download page instead: https://developers.openai.com/codex/app. This should be updated at both lines 20 and 161.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/computer-use.md` at line 20, Update the two occurrences of the Codex.app
download URL in docs/computer-use.md: replace the link target "https://codex.ai"
with the official OpenAI page "https://developers.openai.com/codex/app" (the
visible text can remain "Codex.app"), ensuring both instances mentioned in the
comment (the line referencing the SkyComputerUseClient binary and the later
occurrence around line 161) are changed so the `/Applications/Codex.app`
detection note still reads correctly and the link points to the official OpenAI
download.
| **Computer Use (macOS):** | ||
| ``` | ||
| Use cu_status to check connection | ||
| Use cu_get_app_state with app "Finder" to see the desktop | ||
| Use cu_click with app "Safari" and element_index "42" to click a link | ||
| ``` |
There was a problem hiding this comment.
Minor: missing language on fenced code block (MD040).
markdownlint flagged this. For consistency with the other example blocks (which are also unlabeled but flagged identically), consider adding a hint like ```text to the new Computer Use example.
📝 Proposed fix
**Computer Use (macOS):**
-```
+```text
Use cu_status to check connection
Use cu_get_app_state with app "Finder" to see the desktop
Use cu_click with app "Safari" and element_index "42" to click a link</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 101-101: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @README.md around lines 100 - 105, Add a language hint to the fenced code
block in the "Computer Use (macOS)" example in README.md to satisfy markdownlint
MD040; specifically update the triple-backtick fence that encloses the commands
for cu_status, cu_get_app_state and cu_click to include a language token such as
text (e.g., ```text) so the block is labeled consistently with the other
examples.
</details>
<!-- fingerprinting:phantom:poseidon:nectarine:ae9b0301-780f-455f-9f77-406a0872617c -->
<!-- This is an auto-generated comment by CodeRabbit -->
| const msg = JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n'; | ||
| this.process?.stdin?.write(msg); |
There was a problem hiding this comment.
Writes to stdin are silently dropped if the process is gone.
this.process?.stdin?.write(msg) is a no-op when stdin is unavailable (spawn failed, already closed, stdin destroyed). For sendRequest the caller eventually sees a timeout, but for sendNotification the notification is lost with no signal. Consider checking writable and returning/rejecting explicitly, and handling write's error callback.
🛡️ Proposed fix
- const msg = JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n';
- this.process?.stdin?.write(msg);
+ const stdin = this.process?.stdin;
+ if (!stdin || !stdin.writable) {
+ clearTimeout(timer);
+ this.pending.delete(id);
+ reject(new Error(`Cannot send ${method}: stdin not writable`));
+ return;
+ }
+ const msg = JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n';
+ stdin.write(msg, (err) => {
+ if (err) {
+ clearTimeout(timer);
+ this.pending.delete(id);
+ reject(err);
+ }
+ });Also applies to: 146-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/computer-use/client.ts` around lines 137 - 138,
sendNotification/sendRequest currently do this.process?.stdin?.write(msg) which
silently drops messages if the child process or its stdin is gone; change both
sendNotification and sendRequest to explicitly check that this.process and
this.process.stdin exist and this.process.stdin.writable (or !destroyed) before
attempting to write, and if not available immediately reject/return an error
(for sendRequest reject the Promise, for sendNotification return/fail fast).
When writing, use the write() return value and the write callback to surface
write errors (invoke the sendRequest reject or surface a notification failure),
and ensure you handle the case where write returns false (buffering) by
listening for 'drain' or treating it as backpressure error. Update the code
paths referenced by the snippets (the methods that construct msg and call
this.process?.stdin?.write at the shown locations) to implement these checks and
callback-based error handling.
| name: CU_TOOLS.CLICK, | ||
| description: | ||
| 'Click an element by index or pixel coordinates. Prefer element_index when available from accessibility tree.', | ||
| inputSchema: { | ||
| type: 'object', | ||
| properties: { | ||
| app: { | ||
| type: 'string', | ||
| description: 'App name or bundle identifier', | ||
| }, | ||
| element_index: { | ||
| type: 'string', | ||
| description: 'Element index from accessibility tree', | ||
| }, | ||
| x: { | ||
| type: 'number', | ||
| description: 'X coordinate in screenshot pixel coordinates', | ||
| }, | ||
| y: { | ||
| type: 'number', | ||
| description: 'Y coordinate in screenshot pixel coordinates', | ||
| }, | ||
| mouse_button: { | ||
| type: 'string', | ||
| enum: ['left', 'right', 'middle'], | ||
| description: 'Mouse button to click (default: left)', | ||
| }, | ||
| click_count: { | ||
| type: 'integer', | ||
| description: 'Number of clicks (default: 1)', | ||
| }, | ||
| }, | ||
| required: ['app'], | ||
| }, |
There was a problem hiding this comment.
Require a click target in the advertised schema.
Line 77 allows { app } with no element_index and no coordinates, so clients can issue an accepted-but-unexecutable click. Align the JSON schema and runtime Zod schema to require either element_index or both x/y.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/computer-use/definitions.ts` around lines 45 - 78, The JSON inputSchema
for CU_TOOLS.CLICK currently only requires 'app', allowing no click target;
change the schema to require either element_index OR both x and y by replacing
the flat required with a oneOf clause (one option: { required:
['app','element_index'] } and another: { required: ['app','x','y'] }) and update
the runtime Zod validation used for CU_TOOLS.CLICK to mirror this logic
(validate presence of element_index OR both x and y and preserve types for
mouse_button and click_count) so the advertised schema and runtime checks are
consistent.
| // Validate tool name. | ||
| const binaryToolName = CU_TO_BINARY[toolName]; | ||
| if (!binaryToolName) { | ||
| throw new Error(`Unknown computer-use tool: ${toolName}`); | ||
| } |
There was a problem hiding this comment.
Use a custom error class for unknown tools.
Line 38 throws a raw Error, bypassing the standardized tool error path. Prefer ValidationError here.
🔧 Proposed fix
const binaryToolName = CU_TO_BINARY[toolName];
if (!binaryToolName) {
- throw new Error(`Unknown computer-use tool: ${toolName}`);
+ throw new ValidationError(toolName, `Unknown computer-use tool: ${toolName}`);
}As per coding guidelines, “Use custom error classes ValidationError and ToolExecutionError for error handling”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/computer-use/handlers.ts` around lines 35 - 39, The code currently throws
a raw Error when an unknown tool name is encountered (CU_TO_BINARY lookup for
toolName); replace the raw throw with throwing the standardized ValidationError
(throw new ValidationError(`Unknown computer-use tool: ${toolName}`)) so the
error follows the project's error path, and add or update the import of
ValidationError at the top of the file if not already imported; keep the same
message text to preserve semantics.
| private handleStatus(): ToolResult { | ||
| const bridge = ComputerUseBridge.getInstance(); | ||
| const binaryInfo = bridge.getBinaryInfo(); | ||
| const error = bridge.getError(); | ||
|
|
||
| const status = { | ||
| connected: bridge.isReady(), | ||
| binary: binaryInfo | ||
| ? { | ||
| path: binaryInfo.path, | ||
| type: binaryInfo.type, | ||
| } | ||
| : null, | ||
| error: error?.message || null, | ||
| }; |
There was a problem hiding this comment.
Make cu_status probe binary discovery before reporting null state.
On a fresh process, getBinaryInfo() and getError() are both empty until a non-status tool attempts initialization, so cu_status cannot actually report the binary path or “not found” state. Consider doing a side-effect-free discovery probe here and returning that result without launching the subprocess.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/computer-use/handlers.ts` around lines 68 - 82, The status handler
currently returns null for binary and error when
ComputerUseBridge.getBinaryInfo() and getError() are empty; update handleStatus
to perform a side-effect-free discovery probe via the bridge before returning:
call a discovery method on the ComputerUseBridge instance (e.g., a new or
existing probe/discover method on ComputerUseBridge) to attempt to locate the
binary and populate binaryInfo/error without starting the subprocess, then use
those probe results (falling back to the existing getBinaryInfo()/getError() if
present) when building the status object; ensure you reference
ComputerUseBridge.getInstance(), ComputerUseBridge.getBinaryInfo(),
ComputerUseBridge.getError(), and ComputerUseBridge.isReady() while avoiding
launching the real tool process.
| export interface ToolResult { | ||
| content: Array<{ | ||
| type: 'text'; | ||
| type: 'text' | 'image'; | ||
| text: string; | ||
| data?: string; | ||
| mimeType?: string; | ||
| _meta?: Record<string, unknown>; | ||
| }>; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/types.ts | head -100Repository: tuannvm/codex-mcp-server
Length of output: 3276
🏁 Script executed:
rg "ToolResult" -A 5 -B 5Repository: tuannvm/codex-mcp-server
Length of output: 10420
🏁 Script executed:
fd -type f -name "*.ts" | head -20Repository: tuannvm/codex-mcp-server
Length of output: 238
🏁 Script executed:
rg "type.*image" -A 3 -B 3Repository: tuannvm/codex-mcp-server
Length of output: 819
🏁 Script executed:
rg "image" src/tools/handlers.ts src/computer-use/handlers.ts -A 2 -B 2Repository: tuannvm/codex-mcp-server
Length of output: 50
🏁 Script executed:
rg "CallToolResultSchema" -A 10 -B 2Repository: tuannvm/codex-mcp-server
Length of output: 3288
🏁 Script executed:
cat src/computer-use/client.ts | grep -A 20 "const content = "Repository: tuannvm/codex-mcp-server
Length of output: 605
🏁 Script executed:
cat -n src/__tests__/index.test.ts | sed -n '1,80p'Repository: tuannvm/codex-mcp-server
Length of output: 3441
🏁 Script executed:
rg "data.*mimeType" -B 5 -A 5Repository: tuannvm/codex-mcp-server
Length of output: 50
🏁 Script executed:
rg "screenshot" -A 5 -B 5 src/computer-use/client.tsRepository: tuannvm/codex-mcp-server
Length of output: 413
🏁 Script executed:
rg "get_app_state" -A 10 -B 2Repository: tuannvm/codex-mcp-server
Length of output: 9894
🏁 Script executed:
cat -n src/computer-use/client.ts | sed -n '60,120p'Repository: tuannvm/codex-mcp-server
Length of output: 2181
🏁 Script executed:
rg "ToolResult" src/types.ts -A 15Repository: tuannvm/codex-mcp-server
Length of output: 425
🏁 Script executed:
rg -i "zod.*toolresult|toolresult.*zod"Repository: tuannvm/codex-mcp-server
Length of output: 50
🏁 Script executed:
cat -n src/types.ts | sed -n '75,200p'Repository: tuannvm/codex-mcp-server
Length of output: 3465
🏁 Script executed:
rg "ImageContent|TextContent" -A 5 -B 2Repository: tuannvm/codex-mcp-server
Length of output: 50
🏁 Script executed:
cat -n src/server.ts | grep -A 20 "callTool\|CallToolResult" | head -50Repository: tuannvm/codex-mcp-server
Length of output: 50
🏁 Script executed:
cat -n src/server.ts | sed -n '1,100p'Repository: tuannvm/codex-mcp-server
Length of output: 3590
🏁 Script executed:
rg "class CodexMcpServer" -A 30Repository: tuannvm/codex-mcp-server
Length of output: 1332
Use discriminated union to separate text and image content shapes.
Lines 79-85 require text: string for all content items, including when type: 'image'. However, cu_get_app_state returns image content (screenshots) with data and mimeType, not text, per the MCP schema. The current interface is incompatible with MCP image content structures and requires a workaround in src/computer-use/client.ts (line 101: text: item.text || ''). Use discriminated unions to separate text and image content types:
🔧 Proposed type fix
+type ToolContent =
+ | {
+ type: 'text';
+ text: string;
+ _meta?: Record<string, unknown>;
+ }
+ | {
+ type: 'image';
+ data: string;
+ mimeType: string;
+ _meta?: Record<string, unknown>;
+ };
+
// Tool result interface matching MCP SDK expectations
export interface ToolResult {
- content: Array<{
- type: 'text' | 'image';
- text: string;
- data?: string;
- mimeType?: string;
- _meta?: Record<string, unknown>;
- }>;
+ content: ToolContent[];
structuredContent?: Record<string, unknown>;
isError?: boolean;
_meta?: Record<string, unknown>;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/types.ts` around lines 78 - 85, The ToolResult.content item should be a
discriminated union instead of a single shape that forces text for images;
replace the current inline object with two interfaces (e.g., TextContent { type:
'text'; text: string; _meta?: Record<string, unknown> } and ImageContent { type:
'image'; data: string; mimeType: string; _meta?: Record<string, unknown> }) and
change ToolResult to content: Array<TextContent | ImageContent>; then update
call sites that assume item.text (for example the code in client.ts that uses
item.text || '') to handle the two variants by checking item.type === 'text'
before reading text or treating images via data/mimeType.
… automation Remove dead src/computer-use/ module (macOS binary locked behind team ID check) and add cross-platform browser automation via Playwright. New module src/browser-use/ with 10 MCP tools: browser_launch, browser_screenshot, browser_click, browser_type, browser_scroll, browser_drag, browser_key, browser_navigate, browser_close, browser_status. - Singleton BrowserUseBridge manages concurrent sessions - Playwright is a peer dependency (lazy init, graceful errors) - SIGINT/SIGTERM handlers clean up browser sessions on shutdown - Key combinations auto-normalized (Cmd→Meta, Ctrl→Control, etc.) - 86 tests pass, lint clean Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (4)
src/tools/handlers.ts (1)
583-605: Consider forwardingprogressTokenso browser tools also emit progress notifications.Operations like
browser_launch,browser_navigate, andbrowser_screenshotcan take several seconds (cold Chromium start, page load). The other tools in this file already stream progress; these new ones silently dropcontext(see_contextinBrowserUseToolHandler). At minimum, sending a "Launching browser…" / "Navigating…" notification would align with the rest of the registry.🛠️ Sketch
In
src/browser-use/handlers.ts, replace_contextwithcontextand emit a shortcontext.sendProgress('…')at the top of long-running cases (BROWSER_LAUNCH,BROWSER_NAVIGATE,BROWSER_SCREENSHOT).As per coding guidelines: "Tools must support streaming progress via MCP notifications/progress with progress context created from request._meta?.progressToken".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/handlers.ts` around lines 583 - 605, The browserHandler is dropping the progress context so browser tools can't emit streaming progress; change browserHandler to forward the incoming context into browserUseHandler.execute (preserve the context/progressToken from request._meta?.progressToken) and update BrowserUseToolHandler to use that context (remove _context usage). Inside BrowserUseToolHandler handle cases BROWSER_LAUNCH, BROWSER_NAVIGATE, and BROWSER_SCREENSHOT to immediately call context.sendProgress with a short message like "Launching browser…" / "Navigating…" / "Taking screenshot…" (when context exists) before the long-running work so MCP progress notifications are emitted.src/browser-use/client.ts (2)
92-98: Swallowing all close errors is too broad — at least log them.A bare
catch {}here means bugs like “page crashed” or “protocol error” will go completely unnoticed, making shutdown issues un-debuggable. Narrow to the expected “already closed” case or log at debug level.- try { - await browser.close(); - } catch { - // Browser may already be closed - } + try { + await browser.close(); + } catch (err) { + // Browser may already be closed; surface unexpected failures for debuggability + console.debug(`closeSession(${session.sessionId}) ignored error:`, err); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/browser-use/client.ts` around lines 92 - 98, In closeSession (function closeSession, parameter BrowserSession) don’t silently swallow all errors from browser.close(); instead catch and handle only the expected "already closed" case or log unexpected errors at debug/error level; update the try/catch to inspect the caught error (or its message/type) and if it’s not the benign already-closed/protocol-connection-lost case call processLogger.debug/error (or the module logger) with the error details so shutdown failures like "page crashed" or protocol errors are visible.
4-17: Minor: cache the in-flight promise, not just the resolved module.If
getPlaywright()is invoked concurrently before the firstimport()resolves, you’ll kick off multiple dynamic imports. Node caches the module so this is harmless in practice, but caching the promise is cleaner and avoids redundant work on cold start.-let playwrightModule: { chromium: { launch(opts?: Record<string, unknown>): Promise<Browser> } } | null = null; - -async function getPlaywright() { - if (!playwrightModule) { - try { - playwrightModule = await import('playwright'); - } catch { - throw new Error(...); - } - } - return playwrightModule; -} +let playwrightPromise: Promise<typeof import('playwright')> | null = null; + +function getPlaywright() { + if (!playwrightPromise) { + playwrightPromise = import('playwright').catch((err) => { + playwrightPromise = null; // allow retry after install + throw new ToolExecutionError('Playwright is not installed...', { cause: err }); + }); + } + return playwrightPromise; +}As a side benefit, typing the promise as
typeof import('playwright')removes the hand-rolled structural type and theas unknown as { newContext... }cast on line 28.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/browser-use/client.ts` around lines 4 - 17, The getPlaywright function currently caches only the resolved module in playwrightModule which allows concurrent callers to trigger multiple import() calls; change the cache to store the in-flight promise instead and type it as Promise<typeof import('playwright')> (or nullable) so concurrent calls await the same import promise; update references that await/import the module to await that promise and then use the resolved module (replace the hand-rolled structural type with typeof import('playwright') for playwrightModule).src/browser-use/types.ts (1)
99-118: Nit: drop identity entries and consider case-insensitive lookup.
Backspace,Enter,Tab, andEscapemap to themselves. Also, users will likely sendctrl+c/cmd+ain lowercase; the current map is case-sensitive soctrlpasses through unchanged and Playwright’skeyboard.press('ctrl+c')will not resolve the modifier. Consider normalizing case before lookup.- if (key.includes('+')) { - return key.split('+').map((part) => map[part.trim()] ?? part.trim()).join('+'); - } - return map[key] ?? key; + const lookup = (k: string) => { + const trimmed = k.trim(); + const cap = trimmed.charAt(0).toUpperCase() + trimmed.slice(1).toLowerCase(); + return map[trimmed] ?? map[cap] ?? trimmed; + }; + if (key.includes('+')) { + return key.split('+').map(lookup).join('+'); + } + return lookup(key);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/browser-use/types.ts` around lines 99 - 118, normalizeKey currently includes identity entries in the map and does a case-sensitive lookup, so keys like "ctrl+c" won't match; update normalizeKey to remove entries that map to themselves (e.g., Backspace, Enter, Tab, Escape) from the map and perform a case-insensitive lookup by normalizing parts before lookup (e.g., trim and toLowerCase each part, use a map keyed by lowercase aliases like "ctrl", "cmd", "option", "del" -> target tokens "Control","Meta","Alt","Delete", etc.), then join with '+' and fall back to the original trimmed part if no mapping exists; adjust both the single-key return path and the '+'-split branch to use this case-insensitive mapping logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/browser-use.md`:
- Around line 28-34: The example for the browser_status tool is misleading
because browser_status accepts no parameters; update the example JSON block
under the browser_status heading to show an empty object (i.e., replace the
current `{ "sessionId": "my-session" }` example with `{}` in the JSON fenced
code block) so the docs reflect the declared properties: {} / required: [] and
match the API reference for browser_status.
In `@src/browser-use/bridge.ts`:
- Around line 32-43: The launch() method has a TOCTOU race on sessionId: check
of this.sessions.has(sessionId) occurs before awaiting createBrowserSession,
allowing two concurrent launches to spawn orphaned browsers and allowing
shutdown() to clear the map then have a pending launch reinsert a session. Add a
private pending: Set<string> on the class, and in launch() reserve the slot by
adding sessionId to pending before awaiting createBrowserSession (throw if
pending already contains sessionId or sessions.has(sessionId)), then await
createBrowserSession, and in a finally block remove sessionId from pending; when
the created session resolves, insert it into this.sessions only if sessions
doesn't already contain it and the instance is not shut down (and if insertion
is refused, close the created browser to avoid leaks). Ensure
shutdown()/closeSession() still clears sessions and checks pending or prevents
new insertions as needed.
In `@src/browser-use/client.ts`:
- Line 1: The file fails Prettier formatting; run the project's formatter (e.g.
npm run format) to reformat this module so it passes prettier --check, then
re-stage the changes; ensure the import statement (import type { Browser,
BrowserContext, Page } from 'playwright';) and surrounding lines are adjusted to
match the project's Prettier configuration and commit the formatted file.
- Around line 10-14: Replace the bare Error throw in the catch block with the
project’s ToolExecutionError and preserve the original exception as the cause;
specifically, import ToolExecutionError (if not already) and in the catch for
the Playwright import/require, throw a new ToolExecutionError with the same
descriptive message about installing Playwright and attach the caught error as
the underlying cause (using the error’s cause/constructor convention used in the
codebase) so the original import failure is retained; update the catch block
around the Playwright import in client.ts to reference ToolExecutionError
instead of Error.
- Around line 19-42: In createBrowserSession, prevent leaking the launched
browser and validate opts.url before navigation: wrap the post-launch steps
(context/newContext, context.newPage, and page.goto) in a try/catch/finally so
that any thrown error results in await browser.close() (use the browser variable
referenced in the function) before rethrowing; also validate opts.url using a
URL parser (e.g., new URL(opts.url)) and enforce allowed protocols (http, https)
and non-empty host before calling page.goto, and if validation fails close the
browser and throw a clear error so no Chromium process is left running.
In `@src/browser-use/types.ts`:
- Line 1: The file containing the line "import { z } from 'zod';" fails Prettier
check; run the repository formatter (npm run format or make lint) to reformat
this file, fix any spacing/semicolon/newline issues Prettier reports, and
re-stage the updated file before pushing so the import line and surrounding code
conform to the project's Prettier rules.
- Around line 16-67: Update the URL fields to only accept HTTP(S) and reject
file/other schemes: replace LaunchSchema's url (currently z.string().optional())
and NavigateSchema's url (currently z.string().min(1)) with a validator that
enforces HTTP/HTTPS (e.g., z.httpUrl() if available, or
z.string().url().refine(u => /^https?:\/\//i.test(u), "only http(s) URLs
allowed")). Keep the optional/default semantics for LaunchSchema and preserve
NavigateSchema as required, and note that host-level checks (private IP,
localhost, link‑local, DNS rebinding) must still be enforced in the
bridge/client before calling page.goto().
- Around line 16-45: Remove the redundant .optional() chained before .default()
on fields that have defaults so Zod 4 applies defaults correctly: update
LaunchSchema (headless, viewportWidth, viewportHeight), ClickSchema (button,
clickCount) and ScrollSchema (amount) by deleting the .optional() calls that
precede .default(...). Keep the same types (z.boolean(), z.int(), z.enum(...),
etc.) and only remove the .optional() chaining so the schema output types are
non‑optional and defaults are applied as intended.
---
Nitpick comments:
In `@src/browser-use/client.ts`:
- Around line 92-98: In closeSession (function closeSession, parameter
BrowserSession) don’t silently swallow all errors from browser.close(); instead
catch and handle only the expected "already closed" case or log unexpected
errors at debug/error level; update the try/catch to inspect the caught error
(or its message/type) and if it’s not the benign
already-closed/protocol-connection-lost case call processLogger.debug/error (or
the module logger) with the error details so shutdown failures like "page
crashed" or protocol errors are visible.
- Around line 4-17: The getPlaywright function currently caches only the
resolved module in playwrightModule which allows concurrent callers to trigger
multiple import() calls; change the cache to store the in-flight promise instead
and type it as Promise<typeof import('playwright')> (or nullable) so concurrent
calls await the same import promise; update references that await/import the
module to await that promise and then use the resolved module (replace the
hand-rolled structural type with typeof import('playwright') for
playwrightModule).
In `@src/browser-use/types.ts`:
- Around line 99-118: normalizeKey currently includes identity entries in the
map and does a case-sensitive lookup, so keys like "ctrl+c" won't match; update
normalizeKey to remove entries that map to themselves (e.g., Backspace, Enter,
Tab, Escape) from the map and perform a case-insensitive lookup by normalizing
parts before lookup (e.g., trim and toLowerCase each part, use a map keyed by
lowercase aliases like "ctrl", "cmd", "option", "del" -> target tokens
"Control","Meta","Alt","Delete", etc.), then join with '+' and fall back to the
original trimmed part if no mapping exists; adjust both the single-key return
path and the '+'-split branch to use this case-insensitive mapping logic.
In `@src/tools/handlers.ts`:
- Around line 583-605: The browserHandler is dropping the progress context so
browser tools can't emit streaming progress; change browserHandler to forward
the incoming context into browserUseHandler.execute (preserve the
context/progressToken from request._meta?.progressToken) and update
BrowserUseToolHandler to use that context (remove _context usage). Inside
BrowserUseToolHandler handle cases BROWSER_LAUNCH, BROWSER_NAVIGATE, and
BROWSER_SCREENSHOT to immediately call context.sendProgress with a short message
like "Launching browser…" / "Navigating…" / "Taking screenshot…" (when context
exists) before the long-running work so MCP progress notifications are emitted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d505c623-0b16-496b-8d7c-4622c0e441cc
📒 Files selected for processing (15)
README.mddocs/api-reference.mddocs/browser-use.mdjest.config.mjssrc/__tests__/index.test.tssrc/browser-use/bridge.tssrc/browser-use/client.tssrc/browser-use/definitions.tssrc/browser-use/handlers.tssrc/browser-use/playwright.d.tssrc/browser-use/types.tssrc/index.tssrc/tools/definitions.tssrc/tools/handlers.tssrc/types.ts
✅ Files skipped from review due to trivial changes (2)
- jest.config.mjs
- src/browser-use/playwright.d.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- src/tests/index.test.ts
- src/tools/definitions.ts
- src/types.ts
| async launch(sessionId: string, opts?: { url?: string; headless?: boolean; viewportWidth?: number; viewportHeight?: number }): Promise<BrowserSession> { | ||
| const canUse = await this.checkAvailability(); | ||
| if (!canUse) { | ||
| throw new Error(this.checkError ?? 'Playwright is not installed'); | ||
| } | ||
| if (this.sessions.has(sessionId)) { | ||
| throw new Error(`Session "${sessionId}" already exists. Close it first or use a different session ID.`); | ||
| } | ||
| const session = await createBrowserSession(sessionId, opts); | ||
| this.sessions.set(sessionId, session); | ||
| return session; | ||
| } |
There was a problem hiding this comment.
TOCTOU on sessionId in launch() can leak Chromium instances.
sessions.has(sessionId) is checked before await createBrowserSession(...), and the slot isn't reserved until after the await. Two concurrent browser_launch calls with the same sessionId both pass the check, both spawn Chromium, and only the second set() wins the map — the first browser is orphaned (untracked, never closed by close() or shutdown()).
Same pattern allows shutdown() to race with an in-flight launch(): shutdown() calls closeSession on current entries and then clear()s the map, but the pending launch resolves afterward and re-inserts a session that will never be cleaned up.
🛠️ Proposed fix — reserve the slot before awaiting
async launch(sessionId: string, opts?: { url?: string; headless?: boolean; viewportWidth?: number; viewportHeight?: number }): Promise<BrowserSession> {
const canUse = await this.checkAvailability();
if (!canUse) {
throw new Error(this.checkError ?? 'Playwright is not installed');
}
- if (this.sessions.has(sessionId)) {
+ if (this.sessions.has(sessionId) || this.pending?.has(sessionId)) {
throw new Error(`Session "${sessionId}" already exists. Close it first or use a different session ID.`);
}
- const session = await createBrowserSession(sessionId, opts);
- this.sessions.set(sessionId, session);
- return session;
+ this.pending ??= new Set();
+ this.pending.add(sessionId);
+ try {
+ const session = await createBrowserSession(sessionId, opts);
+ this.sessions.set(sessionId, session);
+ return session;
+ } finally {
+ this.pending.delete(sessionId);
+ }
}(Declare private pending?: Set<string> on the class.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/browser-use/bridge.ts` around lines 32 - 43, The launch() method has a
TOCTOU race on sessionId: check of this.sessions.has(sessionId) occurs before
awaiting createBrowserSession, allowing two concurrent launches to spawn
orphaned browsers and allowing shutdown() to clear the map then have a pending
launch reinsert a session. Add a private pending: Set<string> on the class, and
in launch() reserve the slot by adding sessionId to pending before awaiting
createBrowserSession (throw if pending already contains sessionId or
sessions.has(sessionId)), then await createBrowserSession, and in a finally
block remove sessionId from pending; when the created session resolves, insert
it into this.sessions only if sessions doesn't already contain it and the
instance is not shut down (and if insertion is refused, close the created
browser to avoid leaks). Ensure shutdown()/closeSession() still clears sessions
and checks pending or prevents new insertions as needed.
Replace 3 browser MCP tools (browser, browser_screenshot, browser_interact) with a single `browser` tool dispatching 10 actions via Zod discriminated union. Per-action schema validation gives clean error messages. All handler methods have typed parameter signatures — zero unsafe casts. Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/browser-use/bridge.ts (1)
19-30: Negative availability is cached permanently — installing Playwright later requires a server restart.Once
import('playwright')fails,this.available = falsesticks for the life of the process, so a subsequentstatuscall (or anylaunch) still reports "not installed" even after the user runsnpm install playwright. Consider caching only the positive result, or re-checking on each call whenavailable === false:🔧 Suggested tweak
async checkAvailability(): Promise<boolean> { - if (this.available !== null) return this.available; + if (this.available === true) return true; try { await import('playwright'); this.available = true; this.checkError = null; } catch { this.available = false; this.checkError = 'Playwright is not installed. Install with: npm install playwright && npx playwright install chromium'; } - return this.available; + return this.available ?? false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/browser-use/bridge.ts` around lines 19 - 30, The checkAvailability method currently sets this.available = false on import failure and never retries; change it to only cache the positive result and avoid permanently caching negative results by not setting this.available to false on catch (or by setting this.available = null in the catch) and only setting this.available = true on success; ensure this.checkError is still populated when import('playwright') fails so subsequent calls to checkAvailability (or launch) will re-attempt the import rather than permanently reporting unavailable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/browser-use.md`:
- Around line 7-13: The fenced code block in the architecture diagram is missing
a language hint (MD040); update the triple-backtick fence that surrounds the
diagram (the block starting with "Claude Code → codex-mcp-server →
BrowserUseBridge...") to include a language hint such as ```text so the block
becomes a properly annotated fenced code block and satisfies markdownlint.
---
Nitpick comments:
In `@src/browser-use/bridge.ts`:
- Around line 19-30: The checkAvailability method currently sets this.available
= false on import failure and never retries; change it to only cache the
positive result and avoid permanently caching negative results by not setting
this.available to false on catch (or by setting this.available = null in the
catch) and only setting this.available = true on success; ensure this.checkError
is still populated when import('playwright') fails so subsequent calls to
checkAvailability (or launch) will re-attempt the import rather than permanently
reporting unavailable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 91c3e7cc-102b-460b-a23a-9e776cbf9603
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (13)
.gitignoreCLAUDE.mdREADME.mddocs/api-reference.mddocs/browser-use.mdpackage.jsonsrc/__tests__/index.test.tssrc/browser-use/bridge.tssrc/browser-use/definitions.tssrc/browser-use/handlers.tssrc/browser-use/types.tssrc/tools/handlers.tssrc/types.ts
✅ Files skipped from review due to trivial changes (3)
- package.json
- .gitignore
- src/browser-use/handlers.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- src/tools/handlers.ts
- src/tests/index.test.ts
- src/types.ts
- src/browser-use/types.ts
- client.ts: cache in-flight promise instead of resolved module; use ToolExecutionError instead of bare Error; prevent browser leak on session creation failure; validate URLs are http(s) only; log close errors instead of silently swallowing - types.ts: remove redundant .optional() before .default(); add http(s) URL validation; case-insensitive normalizeKey with lowercase lookup - bridge.ts: only cache positive availability (retry import on failure); add TOCTOU guard on sessionId with pending set - handlers.ts: forward progress context; emit progress notifications for open/screenshot/navigate actions - docs/browser-use.md: add language hint to architecture diagram
Summary
Changes
Testing
Related Issues
Summary by CodeRabbit
New Features
Documentation
Tests
Chores