Skip to content

Feature/computer use integration#162

Open
tuannvm wants to merge 6 commits intomainfrom
feature/computer-use-integration
Open

Feature/computer use integration#162
tuannvm wants to merge 6 commits intomainfrom
feature/computer-use-integration

Conversation

@tuannvm
Copy link
Copy Markdown
Owner

@tuannvm tuannvm commented Apr 19, 2026

Summary

Changes

Testing

  • Tests pass locally
  • New tests added (if applicable)

Related Issues

Summary by CodeRabbit

  • New Features

    • Browser automation toolset (open, navigate, screenshot, click, type, key, scroll, drag, close, status)
    • Tool outputs now support image results
    • Graceful shutdown invoked on system signals
  • Documentation

    • New browser automation guide and README updates with examples
  • Tests

    • Expanded tests and mocks covering browser tool behaviors
  • Chores

    • Updated .gitignore, ESLint and Jest configs, and declared Playwright as a peer dependency

tuannvm and others added 2 commits April 18, 2026 22:17
Bridge to open-computer-use/SkyComputerUseClient binary over stdio JSON-RPC.
Adds 10 MCP tools (cu_list_apps, cu_get_app_state, cu_click, etc.) with
lazy init, per-tool timeouts, image passthrough, and macOS platform guard.
97 tests passing, lint clean.

Co-authored-by: Claude <noreply@anthropic.com>
Entire-Checkpoint: 8f81eabf57ca
Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com>

Entire-Checkpoint: de373f85600c
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Warning

Rate limit exceeded

@tuannvm has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 11 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 40 minutes and 11 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88a18e27-e202-4035-a799-e03c1d75c6c0

📥 Commits

Reviewing files that changed from the base of the PR and between 0a30c40 and 3057cde.

📒 Files selected for processing (8)
  • docs/browser-use.md
  • src/__tests__/index.test.ts
  • src/browser-use/bridge.ts
  • src/browser-use/client.ts
  • src/browser-use/definitions.ts
  • src/browser-use/handlers.ts
  • src/browser-use/playwright.d.ts
  • src/browser-use/types.ts

Walkthrough

Adds a Playwright-backed browser automation tool ("browser") to MCP: new browser-use module (bridge, client, handlers, types, definitions), integrates the tool into tool registry and handlers, updates types to support image results, adds graceful shutdown for browser sessions, tests and documentation, and marks Playwright as a peer dependency.

Changes

Cohort / File(s) Summary
Repository config & ignore
/.gitignore, /.gitignore (added entries), jest.config.mjs, eslint.config.mjs
Added local artifact ignores (.claude/, trash/, .entire/, test-mcp.mjs, tmux/); Jest now ignores /trash/; extended TS ESLint globals with runtime identifiers.
Docs & README
README.md, docs/api-reference.md, docs/browser-use.md, CLAUDE.md
Added Browser Use documentation, README updates for browser automation and examples, API reference tool annotations for browser and websearch, and new dedicated docs/browser-use.md.
Tool registry and handlers
src/tools/definitions.ts, src/tools/handlers.ts
Appended browserUseToolDefinitions to exported toolDefinitions; registered TOOLS.BROWSERbrowserUseHandler in toolHandlers; exported shutdownBrowserSessions() that delegates to bridge shutdown.
Types
src/types.ts
Added TOOLS.BROWSER and expanded ToolResult.content to accept 'image' with optional data and mimeType.
Browser-use module (new files)
src/browser-use/bridge.ts, src/browser-use/client.ts, src/browser-use/handlers.ts, src/browser-use/types.ts, src/browser-use/definitions.ts, src/browser-use/playwright.d.ts
Implemented singleton Bridge managing sessions and lazy Playwright import; Playwright-backed client operations (launch, navigate, click, screenshot, type, key, scroll, drag, close); Zod schemas and key normalization; tool definition and handler that routes actions and returns text/image ToolResult entries; ambient Playwright typings.
Tests
src/__tests__/index.test.ts
Added Jest mock for playwright, expanded tests to assert the browser tool exists, its schema action enum, handler presence, and behavioral coverage for multiple browser actions; ensures bridge shutdown after tests.
Entrypoint
src/index.ts
Registered SIGINT/SIGTERM handlers to call shutdownBrowserSessions() before exit for graceful cleanup.
Package metadata
package.json
Added peerDependencies.playwright: "^1.59.1" so Playwright is a peer dependency consumers install.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/User
    participant Server as MCP Server
    participant Handler as Browser Tool Handler
    participant Bridge as BrowserUseBridge
    participant PWClient as Playwright Client
    participant Browser as Chromium Page

    Client->>Server: invoke tool "browser" with action
    Server->>Handler: dispatch to browserUseHandler.execute(args)
    Handler->>Handler: parse/validate action (Zod)
    Handler->>Bridge: call bridge.<action>(...)
    Bridge->>PWClient: lazy import & call client function
    PWClient->>Browser: perform page operation (navigate/click/screenshot)
    Browser-->>PWClient: return result
    PWClient-->>Bridge: return structured result (image/text)
    Bridge-->>Handler: return result
    Handler-->>Server: wrap as ToolResult and return
    Server-->>Client: respond with ToolResult

    alt Shutdown (SIGINT/SIGTERM)
        Server->>Handler: shutdownBrowserSessions()
        Handler->>Bridge: bridge.shutdown()
        Bridge->>PWClient: close all browser sessions
        PWClient-->>Bridge: closed
        Bridge-->>Handler: done
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped into code with a cheeky grin,

Playwright in paw to open and spin,
Sessions I tend and screenshots I bring,
Clicks and keypresses—oh what a ring!
Shutdown snug, nap time—ready to spring.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately captures the main change: adding Playwright-based browser automation (computer use) integration to the MCP server. While not exhaustively detailed, it clearly describes the primary objective and is sufficiently specific for history scanning.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/computer-use-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com>

Entire-Checkpoint: d5b3202aaa87
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (6)
src/computer-use/__tests__/discovery.test.ts (1)

4-6: Hard-coded absolute path is fragile but acceptable as a probe.

The bundled binary path is encoded twice (here and presumably in src/computer-use/discovery.ts). If the Codex.app layout changes, this skip-guard silently passes by skipping. Consider importing a constant from discovery.ts (e.g., CODEX_APP_BINARY_PATH) instead of duplicating the literal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/__tests__/discovery.test.ts` around lines 4 - 6, The test
currently duplicates the hard-coded Codex binary path literal when setting
hasCodexBinary; import and use the single source of truth from discovery.ts
(e.g., CODEX_APP_BINARY_PATH) instead of repeating the string so the probe stays
in sync with discovery logic — replace the literal used by hasCodexBinary with
the imported CODEX_APP_BINARY_PATH constant and remove the duplicated path
literal in the test.
src/computer-use/__tests__/handlers.test.ts (1)

33-49: Consider asserting on ValidationError type, not just toThrow().

Bare .rejects.toThrow() will pass for any thrown error, including unrelated failures (e.g., bridge init, missing binary on non-macOS CI). Per src/computer-use/handlers.ts lines 59-65, ZodError is converted to ValidationError — assert on that to ensure validation actually short-circuits before any bridge call.

♻️ Proposed refactor
+import { ValidationError } from '../../errors.js';
@@
-      await expect(
-        handler.execute(CU_TOOLS.CLICK, {})
-      ).rejects.toThrow();
+      await expect(
+        handler.execute(CU_TOOLS.CLICK, {})
+      ).rejects.toThrow(ValidationError);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/__tests__/handlers.test.ts` around lines 33 - 49, The tests
currently use bare `.rejects.toThrow()` which will catch any error; update each
failing test that calls handler.execute (for CU_TOOLS.CLICK,
CU_TOOLS.GET_APP_STATE, CU_TOOLS.DRAG) to assert the promise rejects with the
specific ValidationError type that handlers.ts maps ZodError to (i.e., use
`.rejects.toThrowError(ValidationError)` or the equivalent matcher for
ValidationError) so the spec verifies input validation short-circuits before any
bridge/init errors.
eslint.config.mjs (1)

49-58: Consider mirroring Node globals in the test config block.

The test files override (lines 49-58) doesn't include Buffer, setTimeout, clearTimeout, setInterval, clearInterval, or setImmediate. If any of the new src/computer-use/__tests__/*.test.ts files (or future tests) reference those Node globals, lint will fail with no-undef. Consider duplicating the Node globals here too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eslint.config.mjs` around lines 49 - 58, The test config's globals object
(the block that currently lists describe, test, expect, beforeEach, afterEach,
beforeAll, afterAll, jest) is missing common Node globals and should be updated
to mirror the Node globals used elsewhere; add Buffer, setTimeout, clearTimeout,
setInterval, clearInterval, and setImmediate to that globals object so test
files (e.g., src/computer-use/__tests__/*.test.ts) won't trigger no-undef lint
errors.
README.md (1)

104-104: Verify the element_index example value type.

The README example passes element_index "42" (a quoted string). Per docs/api-reference.md line 488 and the ClickSchema (element_index: z.string().optional()), this is correct, but it can read awkwardly to users who'd intuitively pass an integer. Consider noting that element indices come from cu_get_app_state output as strings, or otherwise no change required.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 104, The README example uses a quoted "42" for
element_index which is correct per ClickSchema (element_index:
z.string().optional()), but is confusing; update the cu_click example to clarify
that element_index is a string (it comes from the cu_get_app_state output) —
either add a short parenthetical note next to element_index in the example
stating "element_index must be a string (as returned by cu_get_app_state)" or
show both forms (string example and a note) so users understand why it's quoted;
reference the ClickSchema and cu_get_app_state when adding the clarification.
src/computer-use/client.ts (2)

28-31: Full process.env is forwarded to the subprocess.

The entire parent environment (including any secrets like API keys) is passed to the external binary. Since the binary only needs macOS accessibility, consider forwarding a minimal allow-list (PATH, HOME, USER, LANG, display-related vars) to reduce blast radius if the binary ever logs or leaks env.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/client.ts` around lines 28 - 31, The child process is
currently started with env: { ...process.env } which forwards all environment
variables (including secrets); update the spawn call that creates this.process
(the line using spawn(this.binary.path, this.binary.args, ...)) to pass a
minimal allow-list instead of the whole env: build an env object that picks only
required keys (e.g., PATH, HOME, USER, LANG, TERM and any macOS/display-related
vars your binary needs) from process.env and use that object for the env option;
ensure any required runtime variables for accessibility checks are included and
fall back to safe defaults if missing.

112-117: disconnect() relies on default SIGTERM with no escalation.

process?.kill() sends SIGTERM and returns immediately; a misbehaving binary may never exit. Consider a short SIGTERM → SIGKILL escalation timer so the bridge can reliably reset state when a tool hangs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/client.ts` around lines 112 - 117, The disconnect() method
currently calls this.process?.kill() (default SIGTERM) and returns immediately;
update it to perform a SIGTERM → SIGKILL escalation: if this.process exists,
call this.process.kill('SIGTERM'), attach a one-time exit listener
(process.once('exit')) to clear a fallback timer, start a short timeout (e.g.,
500–1000ms) that checks if the process is still running and, if so, calls
this.process.kill('SIGKILL'), then regardless of exit path ensure you clear the
timer, set this.process = null, this.connected = false, and call
this.rejectAll(new Error('Client disconnected')); also guard against double-kill
by checking process.pid or that this.process is still the same object before
sending SIGKILL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/codex-setup.md:
- Line 7: The current line in codex-setup.md asserts "This is **not** a security
risk — it only affects your local machine." — change this to a balanced tradeoff
statement: explain that ad-hoc re-signing removes vendor signing and
hardened-runtime protections for the local Codex.app binary and can increase the
risk on that machine (e.g., allowing other processes or unsigned code to
interact with the app), while noting the scope is local; update the sentence to
describe that this is a local security tradeoff rather than claiming it is
risk-free.
- Line 68: The smoke-test command uses the GNU `timeout` binary which is absent
on stock macOS; update the command that pipes the initialize JSON into "$BINARY"
mcp to use `gtimeout` when available (fall back to `timeout` otherwise) or add a
note documenting the dependency; specifically modify the invocation that
currently calls `timeout 5 "$BINARY" mcp` (reference the "$BINARY" and `mcp`
usage) to prefer `gtimeout` on macOS so the command succeeds on macOS CI/dev
machines.

In `@docs/computer-use.md`:
- Around line 7-14: The fenced code block containing the ASCII diagram starting
with "Claude Code → codex-mcp-server" should include a language specifier (e.g.,
text) to satisfy markdownlint MD040; update that triple-backtick fence to use
```text so the block is identified as plain text (the ASCII diagram lines
"Claude Code", "codex-mcp-server", "ComputerUseBridge", "SkyComputerUseClient",
"stdio JSON-RPC", "macOS Accessibility APIs" remain unchanged).
- Line 20: Update the two occurrences of the Codex.app download URL in
docs/computer-use.md: replace the link target "https://codex.ai" with the
official OpenAI page "https://developers.openai.com/codex/app" (the visible text
can remain "Codex.app"), ensuring both instances mentioned in the comment (the
line referencing the SkyComputerUseClient binary and the later occurrence around
line 161) are changed so the `/Applications/Codex.app` detection note still
reads correctly and the link points to the official OpenAI download.

In `@README.md`:
- Around line 100-105: Add a language hint to the fenced code block in the
"Computer Use (macOS)" example in README.md to satisfy markdownlint MD040;
specifically update the triple-backtick fence that encloses the commands for
cu_status, cu_get_app_state and cu_click to include a language token such as
text (e.g., ```text) so the block is labeled consistently with the other
examples.

In `@src/computer-use/__tests__/types.test.ts`:
- Around line 38-45: The test iterates only mapped values so it won't catch
missing keys in CU_TO_BINARY; update the test in types.test.ts to assert that
the set of keys of CU_TO_BINARY equals the set of CU_TOOLS excluding
CU_TOOLS.STATUS (instead of only checking values), while still verifying each
mapped value exists in BINARY_TOOLS; use the unique symbols CU_TO_BINARY,
CU_TOOLS, and BINARY_TOOLS to locate and replace the current loop-based
assertions with a keys-based comparison plus the existing value containment
checks.

In `@src/computer-use/bridge.ts`:
- Around line 87-91: The shutdown() method currently disconnects and clears
this.client and this.initialized but leaves cu_status fields (binaryInfo and
error) populated, causing stale diagnostics; update shutdown() to also
clear/reset cu_status.binaryInfo and cu_status.error (e.g., set them to
undefined/null or their default empty state) so that after calling shutdown()
both the client and the cu_status state are fully cleaned up.

In `@src/computer-use/client.ts`:
- Around line 137-138: sendNotification/sendRequest currently do
this.process?.stdin?.write(msg) which silently drops messages if the child
process or its stdin is gone; change both sendNotification and sendRequest to
explicitly check that this.process and this.process.stdin exist and
this.process.stdin.writable (or !destroyed) before attempting to write, and if
not available immediately reject/return an error (for sendRequest reject the
Promise, for sendNotification return/fail fast). When writing, use the write()
return value and the write callback to surface write errors (invoke the
sendRequest reject or surface a notification failure), and ensure you handle the
case where write returns false (buffering) by listening for 'drain' or treating
it as backpressure error. Update the code paths referenced by the snippets (the
methods that construct msg and call this.process?.stdin?.write at the shown
locations) to implement these checks and callback-based error handling.
- Around line 62-64: The current check throws when initResult.result is falsy
which will incorrectly fail on legitimate null/empty results; change the success
condition to check for the presence of initResult.error instead. Update the
block that currently inspects initResult.result (the variables named initResult,
result, error) to throw only when initResult.error is non-null/defined and
otherwise proceed (i.e., treat a null/empty result as success).
- Around line 25-69: The connect() method can be called concurrently and
currently spawns multiple subprocesses; introduce and use an in-flight promise
(e.g. this.connectPromise) inside connect() so that if connect() is called while
a connection is being established callers await the same promise instead of
spawning another process; create the child process and attach handlers as you do
now, set this.connectPromise before awaiting sendRequest('initialize', ...), on
success set this.connected = true and clear this.connectPromise, and on any
error or process 'close' ensure you clear this.connectPromise, clean up/kill the
spawned process if needed and call rejectAll so state remains consistent (use
existing symbols connect(), this.connected, this.process, sendRequest,
rejectAll).

In `@src/computer-use/definitions.ts`:
- Around line 45-78: The JSON inputSchema for CU_TOOLS.CLICK currently only
requires 'app', allowing no click target; change the schema to require either
element_index OR both x and y by replacing the flat required with a oneOf clause
(one option: { required: ['app','element_index'] } and another: { required:
['app','x','y'] }) and update the runtime Zod validation used for CU_TOOLS.CLICK
to mirror this logic (validate presence of element_index OR both x and y and
preserve types for mouse_button and click_count) so the advertised schema and
runtime checks are consistent.

In `@src/computer-use/handlers.ts`:
- Around line 68-82: The status handler currently returns null for binary and
error when ComputerUseBridge.getBinaryInfo() and getError() are empty; update
handleStatus to perform a side-effect-free discovery probe via the bridge before
returning: call a discovery method on the ComputerUseBridge instance (e.g., a
new or existing probe/discover method on ComputerUseBridge) to attempt to locate
the binary and populate binaryInfo/error without starting the subprocess, then
use those probe results (falling back to the existing getBinaryInfo()/getError()
if present) when building the status object; ensure you reference
ComputerUseBridge.getInstance(), ComputerUseBridge.getBinaryInfo(),
ComputerUseBridge.getError(), and ComputerUseBridge.isReady() while avoiding
launching the real tool process.
- Around line 35-39: The code currently throws a raw Error when an unknown tool
name is encountered (CU_TO_BINARY lookup for toolName); replace the raw throw
with throwing the standardized ValidationError (throw new
ValidationError(`Unknown computer-use tool: ${toolName}`)) so the error follows
the project's error path, and add or update the import of ValidationError at the
top of the file if not already imported; keep the same message text to preserve
semantics.

In `@src/types.ts`:
- Around line 78-85: The ToolResult.content item should be a discriminated union
instead of a single shape that forces text for images; replace the current
inline object with two interfaces (e.g., TextContent { type: 'text'; text:
string; _meta?: Record<string, unknown> } and ImageContent { type: 'image';
data: string; mimeType: string; _meta?: Record<string, unknown> }) and change
ToolResult to content: Array<TextContent | ImageContent>; then update call sites
that assume item.text (for example the code in client.ts that uses item.text ||
'') to handle the two variants by checking item.type === 'text' before reading
text or treating images via data/mimeType.

---

Nitpick comments:
In `@eslint.config.mjs`:
- Around line 49-58: The test config's globals object (the block that currently
lists describe, test, expect, beforeEach, afterEach, beforeAll, afterAll, jest)
is missing common Node globals and should be updated to mirror the Node globals
used elsewhere; add Buffer, setTimeout, clearTimeout, setInterval,
clearInterval, and setImmediate to that globals object so test files (e.g.,
src/computer-use/__tests__/*.test.ts) won't trigger no-undef lint errors.

In `@README.md`:
- Line 104: The README example uses a quoted "42" for element_index which is
correct per ClickSchema (element_index: z.string().optional()), but is
confusing; update the cu_click example to clarify that element_index is a string
(it comes from the cu_get_app_state output) — either add a short parenthetical
note next to element_index in the example stating "element_index must be a
string (as returned by cu_get_app_state)" or show both forms (string example and
a note) so users understand why it's quoted; reference the ClickSchema and
cu_get_app_state when adding the clarification.

In `@src/computer-use/__tests__/discovery.test.ts`:
- Around line 4-6: The test currently duplicates the hard-coded Codex binary
path literal when setting hasCodexBinary; import and use the single source of
truth from discovery.ts (e.g., CODEX_APP_BINARY_PATH) instead of repeating the
string so the probe stays in sync with discovery logic — replace the literal
used by hasCodexBinary with the imported CODEX_APP_BINARY_PATH constant and
remove the duplicated path literal in the test.

In `@src/computer-use/__tests__/handlers.test.ts`:
- Around line 33-49: The tests currently use bare `.rejects.toThrow()` which
will catch any error; update each failing test that calls handler.execute (for
CU_TOOLS.CLICK, CU_TOOLS.GET_APP_STATE, CU_TOOLS.DRAG) to assert the promise
rejects with the specific ValidationError type that handlers.ts maps ZodError to
(i.e., use `.rejects.toThrowError(ValidationError)` or the equivalent matcher
for ValidationError) so the spec verifies input validation short-circuits before
any bridge/init errors.

In `@src/computer-use/client.ts`:
- Around line 28-31: The child process is currently started with env: {
...process.env } which forwards all environment variables (including secrets);
update the spawn call that creates this.process (the line using
spawn(this.binary.path, this.binary.args, ...)) to pass a minimal allow-list
instead of the whole env: build an env object that picks only required keys
(e.g., PATH, HOME, USER, LANG, TERM and any macOS/display-related vars your
binary needs) from process.env and use that object for the env option; ensure
any required runtime variables for accessibility checks are included and fall
back to safe defaults if missing.
- Around line 112-117: The disconnect() method currently calls
this.process?.kill() (default SIGTERM) and returns immediately; update it to
perform a SIGTERM → SIGKILL escalation: if this.process exists, call
this.process.kill('SIGTERM'), attach a one-time exit listener
(process.once('exit')) to clear a fallback timer, start a short timeout (e.g.,
500–1000ms) that checks if the process is still running and, if so, calls
this.process.kill('SIGKILL'), then regardless of exit path ensure you clear the
timer, set this.process = null, this.connected = false, and call
this.rejectAll(new Error('Client disconnected')); also guard against double-kill
by checking process.pid or that this.process is still the same object before
sending SIGKILL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b2d4254-1848-4ed5-986b-4290585bab4b

📥 Commits

Reviewing files that changed from the base of the PR and between a57acb6 and 7584735.

📒 Files selected for processing (19)
  • .claude/commands/codex-setup.md
  • .gitignore
  • README.md
  • docs/api-reference.md
  • docs/computer-use.md
  • eslint.config.mjs
  • src/__tests__/index.test.ts
  • src/computer-use/__tests__/discovery.test.ts
  • src/computer-use/__tests__/handlers.test.ts
  • src/computer-use/__tests__/types.test.ts
  • src/computer-use/bridge.ts
  • src/computer-use/client.ts
  • src/computer-use/definitions.ts
  • src/computer-use/discovery.ts
  • src/computer-use/handlers.ts
  • src/computer-use/types.ts
  • src/tools/definitions.ts
  • src/tools/handlers.ts
  • src/types.ts

Comment thread .claude/commands/codex-setup.md Outdated
Comment thread .claude/commands/codex-setup.md Outdated

```bash
BINARY="/Applications/Codex.app/Contents/Resources/plugins/openai-bundled/plugins/computer-use/Codex Computer Use.app/Contents/SharedSupport/SkyComputerUseClient.app/Contents/MacOS/SkyComputerUseClient"
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | timeout 5 "$BINARY" mcp 2>/dev/null | head -1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a macOS-compatible timeout in the smoke test.

timeout is not installed on stock macOS, so this setup command can fail before testing the binary. Use gtimeout when available or document the dependency.

📝 Proposed command fallback
-echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | timeout 5 "$BINARY" mcp 2>/dev/null | head -1
+TIMEOUT_BIN="$(command -v timeout || command -v gtimeout || true)"
+if [ -z "$TIMEOUT_BIN" ]; then
+  echo "Install coreutils for timeout support: brew install coreutils"
+else
+  echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | "$TIMEOUT_BIN" 5 "$BINARY" mcp 2>/dev/null | head -1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | timeout 5 "$BINARY" mcp 2>/dev/null | head -1
TIMEOUT_BIN="$(command -v timeout || command -v gtimeout || true)"
if [ -z "$TIMEOUT_BIN" ]; then
echo "Install coreutils for timeout support: brew install coreutils"
else
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1.0"}}}' | "$TIMEOUT_BIN" 5 "$BINARY" mcp 2>/dev/null | head -1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/codex-setup.md at line 68, The smoke-test command uses the
GNU `timeout` binary which is absent on stock macOS; update the command that
pipes the initialize JSON into "$BINARY" mcp to use `gtimeout` when available
(fall back to `timeout` otherwise) or add a note documenting the dependency;
specifically modify the invocation that currently calls `timeout 5 "$BINARY"
mcp` (reference the "$BINARY" and `mcp` usage) to prefer `gtimeout` on macOS so
the command succeeds on macOS CI/dev machines.

Comment thread docs/computer-use.md Outdated
Comment on lines +7 to +14
```
Claude Code
→ codex-mcp-server
→ ComputerUseBridge (singleton, lazy init)
→ SkyComputerUseClient (Codex.app subprocess)
→ stdio JSON-RPC (newline-delimited)
→ macOS Accessibility APIs
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the fenced code block.

markdownlint MD040 flags this block. Use text (or similar) since the content is an ASCII diagram.

✏️ Proposed fix
-```
+```text
 Claude Code
   → codex-mcp-server
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
Claude Code
→ codex-mcp-server
→ ComputerUseBridge (singleton, lazy init)
→ SkyComputerUseClient (Codex.app subprocess)
→ stdio JSON-RPC (newline-delimited)
→ macOS Accessibility APIs
```
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/computer-use.md` around lines 7 - 14, The fenced code block containing
the ASCII diagram starting with "Claude Code → codex-mcp-server" should include
a language specifier (e.g., text) to satisfy markdownlint MD040; update that
triple-backtick fence to use ```text so the block is identified as plain text
(the ASCII diagram lines "Claude Code", "codex-mcp-server", "ComputerUseBridge",
"SkyComputerUseClient", "stdio JSON-RPC", "macOS Accessibility APIs" remain
unchanged).

Comment thread docs/computer-use.md Outdated

## Setup

Requires [Codex.app](https://codex.ai) installed. The `SkyComputerUseClient` binary is detected automatically from `/Applications/Codex.app`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file and surrounding context
fd docs/computer-use.md

Repository: tuannvm/codex-mcp-server

Length of output: 50


🏁 Script executed:

# Read the file to see lines around 20 and 161
cat -n docs/computer-use.md | sed -n '15,25p'

Repository: tuannvm/codex-mcp-server

Length of output: 589


🏁 Script executed:

# Also check line 161 context
cat -n docs/computer-use.md | sed -n '156,166p'

Repository: tuannvm/codex-mcp-server

Length of output: 585


🏁 Script executed:

# Search for SkyComputerUseClient in the codebase
rg -i "SkyComputerUseClient" --type md --type py --type js --type ts

Repository: tuannvm/codex-mcp-server

Length of output: 1395


🌐 Web query:

Codex.app SkyComputerUseClient binary

💡 Result:

No high-confidence evidence found for a "SkyComputerUseClient binary" associated with Codex.app (OpenAI's coding agent desktop application). Searches across general web, OpenAI docs, GitHub issues, and company profiles returned extensive details on OpenAI Codex CLI/app binaries (e.g., codex.exe on Windows, bundled in DMG on macOS), computer-use MCP servers (computer-use-mcp), and app-server clients, but nothing matching "SkyComputerUseClient". It may be an internal/unreleased component, misremembered name (e.g., related to "computer-use" tools), or non-existent. Codex binaries are downloadable from GitHub releases or app stores; computer-use is a plugin/MCP feature requiring separate Node.js setup.

Citations:


🌐 Web query:

OpenAI Codex desktop app download

💡 Result:

The OpenAI Codex desktop app is available for download on Windows and macOS. It is the official command center for agentic software development using Codex, powered by ChatGPT. For Windows: - Download from the Microsoft Store: https://apps.microsoft.com/detail/9plm9xgg6vks?gl=US&hl=en-US - Or via command line: winget install Codex -s msstore For macOS: - Download from official OpenAI developer docs (direct link referenced but not inline; visit https://developers.openai.com/codex/app or https://developers.openai.com/codex/quickstart?setup=app). Choose Apple Silicon or Intel build as appropriate. Sign in with a ChatGPT account (Plus, Pro, etc.) or API key after installation. Linux support is upcoming (get notified). Note: Original OpenAI Codex (2021 code model) is deprecated; this refers to the 2025+ Codex coding agent product.

Citations:


Replace the https://codex.ai URL with the official OpenAI download location.

The link https://codex.ai is not the official distribution URL for Codex.app. Use the official OpenAI download page instead: https://developers.openai.com/codex/app. This should be updated at both lines 20 and 161.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/computer-use.md` at line 20, Update the two occurrences of the Codex.app
download URL in docs/computer-use.md: replace the link target "https://codex.ai"
with the official OpenAI page "https://developers.openai.com/codex/app" (the
visible text can remain "Codex.app"), ensuring both instances mentioned in the
comment (the line referencing the SkyComputerUseClient binary and the later
occurrence around line 161) are changed so the `/Applications/Codex.app`
detection note still reads correctly and the link points to the official OpenAI
download.

Comment thread README.md Outdated
Comment on lines 100 to 105
**Computer Use (macOS):**
```
Use cu_status to check connection
Use cu_get_app_state with app "Finder" to see the desktop
Use cu_click with app "Safari" and element_index "42" to click a link
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: missing language on fenced code block (MD040).

markdownlint flagged this. For consistency with the other example blocks (which are also unlabeled but flagged identically), consider adding a hint like ```text to the new Computer Use example.

📝 Proposed fix
 **Computer Use (macOS):**
-```
+```text
 Use cu_status to check connection
 Use cu_get_app_state with app "Finder" to see the desktop
 Use cu_click with app "Safari" and element_index "42" to click a link
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 101-101: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 100 - 105, Add a language hint to the fenced code
block in the "Computer Use (macOS)" example in README.md to satisfy markdownlint
MD040; specifically update the triple-backtick fence that encloses the commands
for cu_status, cu_get_app_state and cu_click to include a language token such as
text (e.g., ```text) so the block is labeled consistently with the other
examples.


</details>

<!-- fingerprinting:phantom:poseidon:nectarine:ae9b0301-780f-455f-9f77-406a0872617c -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread src/computer-use/client.ts Outdated
Comment on lines +137 to +138
const msg = JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n';
this.process?.stdin?.write(msg);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Writes to stdin are silently dropped if the process is gone.

this.process?.stdin?.write(msg) is a no-op when stdin is unavailable (spawn failed, already closed, stdin destroyed). For sendRequest the caller eventually sees a timeout, but for sendNotification the notification is lost with no signal. Consider checking writable and returning/rejecting explicitly, and handling write's error callback.

🛡️ Proposed fix
-      const msg = JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n';
-      this.process?.stdin?.write(msg);
+      const stdin = this.process?.stdin;
+      if (!stdin || !stdin.writable) {
+        clearTimeout(timer);
+        this.pending.delete(id);
+        reject(new Error(`Cannot send ${method}: stdin not writable`));
+        return;
+      }
+      const msg = JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n';
+      stdin.write(msg, (err) => {
+        if (err) {
+          clearTimeout(timer);
+          this.pending.delete(id);
+          reject(err);
+        }
+      });

Also applies to: 146-147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/client.ts` around lines 137 - 138,
sendNotification/sendRequest currently do this.process?.stdin?.write(msg) which
silently drops messages if the child process or its stdin is gone; change both
sendNotification and sendRequest to explicitly check that this.process and
this.process.stdin exist and this.process.stdin.writable (or !destroyed) before
attempting to write, and if not available immediately reject/return an error
(for sendRequest reject the Promise, for sendNotification return/fail fast).
When writing, use the write() return value and the write callback to surface
write errors (invoke the sendRequest reject or surface a notification failure),
and ensure you handle the case where write returns false (buffering) by
listening for 'drain' or treating it as backpressure error. Update the code
paths referenced by the snippets (the methods that construct msg and call
this.process?.stdin?.write at the shown locations) to implement these checks and
callback-based error handling.

Comment thread src/computer-use/definitions.ts Outdated
Comment on lines +45 to +78
name: CU_TOOLS.CLICK,
description:
'Click an element by index or pixel coordinates. Prefer element_index when available from accessibility tree.',
inputSchema: {
type: 'object',
properties: {
app: {
type: 'string',
description: 'App name or bundle identifier',
},
element_index: {
type: 'string',
description: 'Element index from accessibility tree',
},
x: {
type: 'number',
description: 'X coordinate in screenshot pixel coordinates',
},
y: {
type: 'number',
description: 'Y coordinate in screenshot pixel coordinates',
},
mouse_button: {
type: 'string',
enum: ['left', 'right', 'middle'],
description: 'Mouse button to click (default: left)',
},
click_count: {
type: 'integer',
description: 'Number of clicks (default: 1)',
},
},
required: ['app'],
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Require a click target in the advertised schema.

Line 77 allows { app } with no element_index and no coordinates, so clients can issue an accepted-but-unexecutable click. Align the JSON schema and runtime Zod schema to require either element_index or both x/y.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/definitions.ts` around lines 45 - 78, The JSON inputSchema
for CU_TOOLS.CLICK currently only requires 'app', allowing no click target;
change the schema to require either element_index OR both x and y by replacing
the flat required with a oneOf clause (one option: { required:
['app','element_index'] } and another: { required: ['app','x','y'] }) and update
the runtime Zod validation used for CU_TOOLS.CLICK to mirror this logic
(validate presence of element_index OR both x and y and preserve types for
mouse_button and click_count) so the advertised schema and runtime checks are
consistent.

Comment thread src/computer-use/handlers.ts Outdated
Comment on lines +35 to +39
// Validate tool name.
const binaryToolName = CU_TO_BINARY[toolName];
if (!binaryToolName) {
throw new Error(`Unknown computer-use tool: ${toolName}`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a custom error class for unknown tools.

Line 38 throws a raw Error, bypassing the standardized tool error path. Prefer ValidationError here.

🔧 Proposed fix
     const binaryToolName = CU_TO_BINARY[toolName];
     if (!binaryToolName) {
-      throw new Error(`Unknown computer-use tool: ${toolName}`);
+      throw new ValidationError(toolName, `Unknown computer-use tool: ${toolName}`);
     }

As per coding guidelines, “Use custom error classes ValidationError and ToolExecutionError for error handling”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/handlers.ts` around lines 35 - 39, The code currently throws
a raw Error when an unknown tool name is encountered (CU_TO_BINARY lookup for
toolName); replace the raw throw with throwing the standardized ValidationError
(throw new ValidationError(`Unknown computer-use tool: ${toolName}`)) so the
error follows the project's error path, and add or update the import of
ValidationError at the top of the file if not already imported; keep the same
message text to preserve semantics.

Comment thread src/computer-use/handlers.ts Outdated
Comment on lines +68 to +82
private handleStatus(): ToolResult {
const bridge = ComputerUseBridge.getInstance();
const binaryInfo = bridge.getBinaryInfo();
const error = bridge.getError();

const status = {
connected: bridge.isReady(),
binary: binaryInfo
? {
path: binaryInfo.path,
type: binaryInfo.type,
}
: null,
error: error?.message || null,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make cu_status probe binary discovery before reporting null state.

On a fresh process, getBinaryInfo() and getError() are both empty until a non-status tool attempts initialization, so cu_status cannot actually report the binary path or “not found” state. Consider doing a side-effect-free discovery probe here and returning that result without launching the subprocess.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/computer-use/handlers.ts` around lines 68 - 82, The status handler
currently returns null for binary and error when
ComputerUseBridge.getBinaryInfo() and getError() are empty; update handleStatus
to perform a side-effect-free discovery probe via the bridge before returning:
call a discovery method on the ComputerUseBridge instance (e.g., a new or
existing probe/discover method on ComputerUseBridge) to attempt to locate the
binary and populate binaryInfo/error without starting the subprocess, then use
those probe results (falling back to the existing getBinaryInfo()/getError() if
present) when building the status object; ensure you reference
ComputerUseBridge.getInstance(), ComputerUseBridge.getBinaryInfo(),
ComputerUseBridge.getError(), and ComputerUseBridge.isReady() while avoiding
launching the real tool process.

Comment thread src/types.ts
Comment on lines 78 to 85
export interface ToolResult {
content: Array<{
type: 'text';
type: 'text' | 'image';
text: string;
data?: string;
mimeType?: string;
_meta?: Record<string, unknown>;
}>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/types.ts | head -100

Repository: tuannvm/codex-mcp-server

Length of output: 3276


🏁 Script executed:

rg "ToolResult" -A 5 -B 5

Repository: tuannvm/codex-mcp-server

Length of output: 10420


🏁 Script executed:

fd -type f -name "*.ts" | head -20

Repository: tuannvm/codex-mcp-server

Length of output: 238


🏁 Script executed:

rg "type.*image" -A 3 -B 3

Repository: tuannvm/codex-mcp-server

Length of output: 819


🏁 Script executed:

rg "image" src/tools/handlers.ts src/computer-use/handlers.ts -A 2 -B 2

Repository: tuannvm/codex-mcp-server

Length of output: 50


🏁 Script executed:

rg "CallToolResultSchema" -A 10 -B 2

Repository: tuannvm/codex-mcp-server

Length of output: 3288


🏁 Script executed:

cat src/computer-use/client.ts | grep -A 20 "const content = "

Repository: tuannvm/codex-mcp-server

Length of output: 605


🏁 Script executed:

cat -n src/__tests__/index.test.ts | sed -n '1,80p'

Repository: tuannvm/codex-mcp-server

Length of output: 3441


🏁 Script executed:

rg "data.*mimeType" -B 5 -A 5

Repository: tuannvm/codex-mcp-server

Length of output: 50


🏁 Script executed:

rg "screenshot" -A 5 -B 5 src/computer-use/client.ts

Repository: tuannvm/codex-mcp-server

Length of output: 413


🏁 Script executed:

rg "get_app_state" -A 10 -B 2

Repository: tuannvm/codex-mcp-server

Length of output: 9894


🏁 Script executed:

cat -n src/computer-use/client.ts | sed -n '60,120p'

Repository: tuannvm/codex-mcp-server

Length of output: 2181


🏁 Script executed:

rg "ToolResult" src/types.ts -A 15

Repository: tuannvm/codex-mcp-server

Length of output: 425


🏁 Script executed:

rg -i "zod.*toolresult|toolresult.*zod"

Repository: tuannvm/codex-mcp-server

Length of output: 50


🏁 Script executed:

cat -n src/types.ts | sed -n '75,200p'

Repository: tuannvm/codex-mcp-server

Length of output: 3465


🏁 Script executed:

rg "ImageContent|TextContent" -A 5 -B 2

Repository: tuannvm/codex-mcp-server

Length of output: 50


🏁 Script executed:

cat -n src/server.ts | grep -A 20 "callTool\|CallToolResult" | head -50

Repository: tuannvm/codex-mcp-server

Length of output: 50


🏁 Script executed:

cat -n src/server.ts | sed -n '1,100p'

Repository: tuannvm/codex-mcp-server

Length of output: 3590


🏁 Script executed:

rg "class CodexMcpServer" -A 30

Repository: tuannvm/codex-mcp-server

Length of output: 1332


Use discriminated union to separate text and image content shapes.

Lines 79-85 require text: string for all content items, including when type: 'image'. However, cu_get_app_state returns image content (screenshots) with data and mimeType, not text, per the MCP schema. The current interface is incompatible with MCP image content structures and requires a workaround in src/computer-use/client.ts (line 101: text: item.text || ''). Use discriminated unions to separate text and image content types:

🔧 Proposed type fix
+type ToolContent =
+  | {
+      type: 'text';
+      text: string;
+      _meta?: Record<string, unknown>;
+    }
+  | {
+      type: 'image';
+      data: string;
+      mimeType: string;
+      _meta?: Record<string, unknown>;
+    };
+
 // Tool result interface matching MCP SDK expectations
 export interface ToolResult {
-  content: Array<{
-    type: 'text' | 'image';
-    text: string;
-    data?: string;
-    mimeType?: string;
-    _meta?: Record<string, unknown>;
-  }>;
+  content: ToolContent[];
   structuredContent?: Record<string, unknown>;
   isError?: boolean;
   _meta?: Record<string, unknown>;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types.ts` around lines 78 - 85, The ToolResult.content item should be a
discriminated union instead of a single shape that forces text for images;
replace the current inline object with two interfaces (e.g., TextContent { type:
'text'; text: string; _meta?: Record<string, unknown> } and ImageContent { type:
'image'; data: string; mimeType: string; _meta?: Record<string, unknown> }) and
change ToolResult to content: Array<TextContent | ImageContent>; then update
call sites that assume item.text (for example the code in client.ts that uses
item.text || '') to handle the two variants by checking item.type === 'text'
before reading text or treating images via data/mimeType.

… automation

Remove dead src/computer-use/ module (macOS binary locked behind team ID
check) and add cross-platform browser automation via Playwright.

New module src/browser-use/ with 10 MCP tools: browser_launch,
browser_screenshot, browser_click, browser_type, browser_scroll,
browser_drag, browser_key, browser_navigate, browser_close, browser_status.

- Singleton BrowserUseBridge manages concurrent sessions
- Playwright is a peer dependency (lazy init, graceful errors)
- SIGINT/SIGTERM handlers clean up browser sessions on shutdown
- Key combinations auto-normalized (Cmd→Meta, Ctrl→Control, etc.)
- 86 tests pass, lint clean

Co-authored-by: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (4)
src/tools/handlers.ts (1)

583-605: Consider forwarding progressToken so browser tools also emit progress notifications.

Operations like browser_launch, browser_navigate, and browser_screenshot can take several seconds (cold Chromium start, page load). The other tools in this file already stream progress; these new ones silently drop context (see _context in BrowserUseToolHandler). At minimum, sending a "Launching browser…" / "Navigating…" notification would align with the rest of the registry.

🛠️ Sketch

In src/browser-use/handlers.ts, replace _context with context and emit a short context.sendProgress('…') at the top of long-running cases (BROWSER_LAUNCH, BROWSER_NAVIGATE, BROWSER_SCREENSHOT).

As per coding guidelines: "Tools must support streaming progress via MCP notifications/progress with progress context created from request._meta?.progressToken".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/handlers.ts` around lines 583 - 605, The browserHandler is dropping
the progress context so browser tools can't emit streaming progress; change
browserHandler to forward the incoming context into browserUseHandler.execute
(preserve the context/progressToken from request._meta?.progressToken) and
update BrowserUseToolHandler to use that context (remove _context usage). Inside
BrowserUseToolHandler handle cases BROWSER_LAUNCH, BROWSER_NAVIGATE, and
BROWSER_SCREENSHOT to immediately call context.sendProgress with a short message
like "Launching browser…" / "Navigating…" / "Taking screenshot…" (when context
exists) before the long-running work so MCP progress notifications are emitted.
src/browser-use/client.ts (2)

92-98: Swallowing all close errors is too broad — at least log them.

A bare catch {} here means bugs like “page crashed” or “protocol error” will go completely unnoticed, making shutdown issues un-debuggable. Narrow to the expected “already closed” case or log at debug level.

-  try {
-    await browser.close();
-  } catch {
-    // Browser may already be closed
-  }
+  try {
+    await browser.close();
+  } catch (err) {
+    // Browser may already be closed; surface unexpected failures for debuggability
+    console.debug(`closeSession(${session.sessionId}) ignored error:`, err);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser-use/client.ts` around lines 92 - 98, In closeSession (function
closeSession, parameter BrowserSession) don’t silently swallow all errors from
browser.close(); instead catch and handle only the expected "already closed"
case or log unexpected errors at debug/error level; update the try/catch to
inspect the caught error (or its message/type) and if it’s not the benign
already-closed/protocol-connection-lost case call processLogger.debug/error (or
the module logger) with the error details so shutdown failures like "page
crashed" or protocol errors are visible.

4-17: Minor: cache the in-flight promise, not just the resolved module.

If getPlaywright() is invoked concurrently before the first import() resolves, you’ll kick off multiple dynamic imports. Node caches the module so this is harmless in practice, but caching the promise is cleaner and avoids redundant work on cold start.

-let playwrightModule: { chromium: { launch(opts?: Record<string, unknown>): Promise<Browser> } } | null = null;
-
-async function getPlaywright() {
-  if (!playwrightModule) {
-    try {
-      playwrightModule = await import('playwright');
-    } catch {
-      throw new Error(...);
-    }
-  }
-  return playwrightModule;
-}
+let playwrightPromise: Promise<typeof import('playwright')> | null = null;
+
+function getPlaywright() {
+  if (!playwrightPromise) {
+    playwrightPromise = import('playwright').catch((err) => {
+      playwrightPromise = null; // allow retry after install
+      throw new ToolExecutionError('Playwright is not installed...', { cause: err });
+    });
+  }
+  return playwrightPromise;
+}

As a side benefit, typing the promise as typeof import('playwright') removes the hand-rolled structural type and the as unknown as { newContext... } cast on line 28.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser-use/client.ts` around lines 4 - 17, The getPlaywright function
currently caches only the resolved module in playwrightModule which allows
concurrent callers to trigger multiple import() calls; change the cache to store
the in-flight promise instead and type it as Promise<typeof
import('playwright')> (or nullable) so concurrent calls await the same import
promise; update references that await/import the module to await that promise
and then use the resolved module (replace the hand-rolled structural type with
typeof import('playwright') for playwrightModule).
src/browser-use/types.ts (1)

99-118: Nit: drop identity entries and consider case-insensitive lookup.

Backspace, Enter, Tab, and Escape map to themselves. Also, users will likely send ctrl+c / cmd+a in lowercase; the current map is case-sensitive so ctrl passes through unchanged and Playwright’s keyboard.press('ctrl+c') will not resolve the modifier. Consider normalizing case before lookup.

-  if (key.includes('+')) {
-    return key.split('+').map((part) => map[part.trim()] ?? part.trim()).join('+');
-  }
-  return map[key] ?? key;
+  const lookup = (k: string) => {
+    const trimmed = k.trim();
+    const cap = trimmed.charAt(0).toUpperCase() + trimmed.slice(1).toLowerCase();
+    return map[trimmed] ?? map[cap] ?? trimmed;
+  };
+  if (key.includes('+')) {
+    return key.split('+').map(lookup).join('+');
+  }
+  return lookup(key);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser-use/types.ts` around lines 99 - 118, normalizeKey currently
includes identity entries in the map and does a case-sensitive lookup, so keys
like "ctrl+c" won't match; update normalizeKey to remove entries that map to
themselves (e.g., Backspace, Enter, Tab, Escape) from the map and perform a
case-insensitive lookup by normalizing parts before lookup (e.g., trim and
toLowerCase each part, use a map keyed by lowercase aliases like "ctrl", "cmd",
"option", "del" -> target tokens "Control","Meta","Alt","Delete", etc.), then
join with '+' and fall back to the original trimmed part if no mapping exists;
adjust both the single-key return path and the '+'-split branch to use this
case-insensitive mapping logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/browser-use.md`:
- Around line 28-34: The example for the browser_status tool is misleading
because browser_status accepts no parameters; update the example JSON block
under the browser_status heading to show an empty object (i.e., replace the
current `{ "sessionId": "my-session" }` example with `{}` in the JSON fenced
code block) so the docs reflect the declared properties: {} / required: [] and
match the API reference for browser_status.

In `@src/browser-use/bridge.ts`:
- Around line 32-43: The launch() method has a TOCTOU race on sessionId: check
of this.sessions.has(sessionId) occurs before awaiting createBrowserSession,
allowing two concurrent launches to spawn orphaned browsers and allowing
shutdown() to clear the map then have a pending launch reinsert a session. Add a
private pending: Set<string> on the class, and in launch() reserve the slot by
adding sessionId to pending before awaiting createBrowserSession (throw if
pending already contains sessionId or sessions.has(sessionId)), then await
createBrowserSession, and in a finally block remove sessionId from pending; when
the created session resolves, insert it into this.sessions only if sessions
doesn't already contain it and the instance is not shut down (and if insertion
is refused, close the created browser to avoid leaks). Ensure
shutdown()/closeSession() still clears sessions and checks pending or prevents
new insertions as needed.

In `@src/browser-use/client.ts`:
- Line 1: The file fails Prettier formatting; run the project's formatter (e.g.
npm run format) to reformat this module so it passes prettier --check, then
re-stage the changes; ensure the import statement (import type { Browser,
BrowserContext, Page } from 'playwright';) and surrounding lines are adjusted to
match the project's Prettier configuration and commit the formatted file.
- Around line 10-14: Replace the bare Error throw in the catch block with the
project’s ToolExecutionError and preserve the original exception as the cause;
specifically, import ToolExecutionError (if not already) and in the catch for
the Playwright import/require, throw a new ToolExecutionError with the same
descriptive message about installing Playwright and attach the caught error as
the underlying cause (using the error’s cause/constructor convention used in the
codebase) so the original import failure is retained; update the catch block
around the Playwright import in client.ts to reference ToolExecutionError
instead of Error.
- Around line 19-42: In createBrowserSession, prevent leaking the launched
browser and validate opts.url before navigation: wrap the post-launch steps
(context/newContext, context.newPage, and page.goto) in a try/catch/finally so
that any thrown error results in await browser.close() (use the browser variable
referenced in the function) before rethrowing; also validate opts.url using a
URL parser (e.g., new URL(opts.url)) and enforce allowed protocols (http, https)
and non-empty host before calling page.goto, and if validation fails close the
browser and throw a clear error so no Chromium process is left running.

In `@src/browser-use/types.ts`:
- Line 1: The file containing the line "import { z } from 'zod';" fails Prettier
check; run the repository formatter (npm run format or make lint) to reformat
this file, fix any spacing/semicolon/newline issues Prettier reports, and
re-stage the updated file before pushing so the import line and surrounding code
conform to the project's Prettier rules.
- Around line 16-67: Update the URL fields to only accept HTTP(S) and reject
file/other schemes: replace LaunchSchema's url (currently z.string().optional())
and NavigateSchema's url (currently z.string().min(1)) with a validator that
enforces HTTP/HTTPS (e.g., z.httpUrl() if available, or
z.string().url().refine(u => /^https?:\/\//i.test(u), "only http(s) URLs
allowed")). Keep the optional/default semantics for LaunchSchema and preserve
NavigateSchema as required, and note that host-level checks (private IP,
localhost, link‑local, DNS rebinding) must still be enforced in the
bridge/client before calling page.goto().
- Around line 16-45: Remove the redundant .optional() chained before .default()
on fields that have defaults so Zod 4 applies defaults correctly: update
LaunchSchema (headless, viewportWidth, viewportHeight), ClickSchema (button,
clickCount) and ScrollSchema (amount) by deleting the .optional() calls that
precede .default(...). Keep the same types (z.boolean(), z.int(), z.enum(...),
etc.) and only remove the .optional() chaining so the schema output types are
non‑optional and defaults are applied as intended.

---

Nitpick comments:
In `@src/browser-use/client.ts`:
- Around line 92-98: In closeSession (function closeSession, parameter
BrowserSession) don’t silently swallow all errors from browser.close(); instead
catch and handle only the expected "already closed" case or log unexpected
errors at debug/error level; update the try/catch to inspect the caught error
(or its message/type) and if it’s not the benign
already-closed/protocol-connection-lost case call processLogger.debug/error (or
the module logger) with the error details so shutdown failures like "page
crashed" or protocol errors are visible.
- Around line 4-17: The getPlaywright function currently caches only the
resolved module in playwrightModule which allows concurrent callers to trigger
multiple import() calls; change the cache to store the in-flight promise instead
and type it as Promise<typeof import('playwright')> (or nullable) so concurrent
calls await the same import promise; update references that await/import the
module to await that promise and then use the resolved module (replace the
hand-rolled structural type with typeof import('playwright') for
playwrightModule).

In `@src/browser-use/types.ts`:
- Around line 99-118: normalizeKey currently includes identity entries in the
map and does a case-sensitive lookup, so keys like "ctrl+c" won't match; update
normalizeKey to remove entries that map to themselves (e.g., Backspace, Enter,
Tab, Escape) from the map and perform a case-insensitive lookup by normalizing
parts before lookup (e.g., trim and toLowerCase each part, use a map keyed by
lowercase aliases like "ctrl", "cmd", "option", "del" -> target tokens
"Control","Meta","Alt","Delete", etc.), then join with '+' and fall back to the
original trimmed part if no mapping exists; adjust both the single-key return
path and the '+'-split branch to use this case-insensitive mapping logic.

In `@src/tools/handlers.ts`:
- Around line 583-605: The browserHandler is dropping the progress context so
browser tools can't emit streaming progress; change browserHandler to forward
the incoming context into browserUseHandler.execute (preserve the
context/progressToken from request._meta?.progressToken) and update
BrowserUseToolHandler to use that context (remove _context usage). Inside
BrowserUseToolHandler handle cases BROWSER_LAUNCH, BROWSER_NAVIGATE, and
BROWSER_SCREENSHOT to immediately call context.sendProgress with a short message
like "Launching browser…" / "Navigating…" / "Taking screenshot…" (when context
exists) before the long-running work so MCP progress notifications are emitted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d505c623-0b16-496b-8d7c-4622c0e441cc

📥 Commits

Reviewing files that changed from the base of the PR and between 7584735 and 7255d86.

📒 Files selected for processing (15)
  • README.md
  • docs/api-reference.md
  • docs/browser-use.md
  • jest.config.mjs
  • src/__tests__/index.test.ts
  • src/browser-use/bridge.ts
  • src/browser-use/client.ts
  • src/browser-use/definitions.ts
  • src/browser-use/handlers.ts
  • src/browser-use/playwright.d.ts
  • src/browser-use/types.ts
  • src/index.ts
  • src/tools/definitions.ts
  • src/tools/handlers.ts
  • src/types.ts
✅ Files skipped from review due to trivial changes (2)
  • jest.config.mjs
  • src/browser-use/playwright.d.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/tests/index.test.ts
  • src/tools/definitions.ts
  • src/types.ts

Comment thread docs/browser-use.md Outdated
Comment thread src/browser-use/bridge.ts Outdated
Comment on lines +32 to +43
async launch(sessionId: string, opts?: { url?: string; headless?: boolean; viewportWidth?: number; viewportHeight?: number }): Promise<BrowserSession> {
const canUse = await this.checkAvailability();
if (!canUse) {
throw new Error(this.checkError ?? 'Playwright is not installed');
}
if (this.sessions.has(sessionId)) {
throw new Error(`Session "${sessionId}" already exists. Close it first or use a different session ID.`);
}
const session = await createBrowserSession(sessionId, opts);
this.sessions.set(sessionId, session);
return session;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TOCTOU on sessionId in launch() can leak Chromium instances.

sessions.has(sessionId) is checked before await createBrowserSession(...), and the slot isn't reserved until after the await. Two concurrent browser_launch calls with the same sessionId both pass the check, both spawn Chromium, and only the second set() wins the map — the first browser is orphaned (untracked, never closed by close() or shutdown()).

Same pattern allows shutdown() to race with an in-flight launch(): shutdown() calls closeSession on current entries and then clear()s the map, but the pending launch resolves afterward and re-inserts a session that will never be cleaned up.

🛠️ Proposed fix — reserve the slot before awaiting
 async launch(sessionId: string, opts?: { url?: string; headless?: boolean; viewportWidth?: number; viewportHeight?: number }): Promise<BrowserSession> {
   const canUse = await this.checkAvailability();
   if (!canUse) {
     throw new Error(this.checkError ?? 'Playwright is not installed');
   }
-  if (this.sessions.has(sessionId)) {
+  if (this.sessions.has(sessionId) || this.pending?.has(sessionId)) {
     throw new Error(`Session "${sessionId}" already exists. Close it first or use a different session ID.`);
   }
-  const session = await createBrowserSession(sessionId, opts);
-  this.sessions.set(sessionId, session);
-  return session;
+  this.pending ??= new Set();
+  this.pending.add(sessionId);
+  try {
+    const session = await createBrowserSession(sessionId, opts);
+    this.sessions.set(sessionId, session);
+    return session;
+  } finally {
+    this.pending.delete(sessionId);
+  }
 }

(Declare private pending?: Set<string> on the class.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser-use/bridge.ts` around lines 32 - 43, The launch() method has a
TOCTOU race on sessionId: check of this.sessions.has(sessionId) occurs before
awaiting createBrowserSession, allowing two concurrent launches to spawn
orphaned browsers and allowing shutdown() to clear the map then have a pending
launch reinsert a session. Add a private pending: Set<string> on the class, and
in launch() reserve the slot by adding sessionId to pending before awaiting
createBrowserSession (throw if pending already contains sessionId or
sessions.has(sessionId)), then await createBrowserSession, and in a finally
block remove sessionId from pending; when the created session resolves, insert
it into this.sessions only if sessions doesn't already contain it and the
instance is not shut down (and if insertion is refused, close the created
browser to avoid leaks). Ensure shutdown()/closeSession() still clears sessions
and checks pending or prevents new insertions as needed.

Comment thread src/browser-use/client.ts
Comment thread src/browser-use/client.ts Outdated
Comment thread src/browser-use/client.ts
Comment thread src/browser-use/types.ts
Comment thread src/browser-use/types.ts Outdated
Comment thread src/browser-use/types.ts Outdated
Replace 3 browser MCP tools (browser, browser_screenshot, browser_interact)
with a single `browser` tool dispatching 10 actions via Zod discriminated
union. Per-action schema validation gives clean error messages. All handler
methods have typed parameter signatures — zero unsafe casts.

Co-authored-by: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/browser-use/bridge.ts (1)

19-30: Negative availability is cached permanently — installing Playwright later requires a server restart.

Once import('playwright') fails, this.available = false sticks for the life of the process, so a subsequent status call (or any launch) still reports "not installed" even after the user runs npm install playwright. Consider caching only the positive result, or re-checking on each call when available === false:

🔧 Suggested tweak
   async checkAvailability(): Promise<boolean> {
-    if (this.available !== null) return this.available;
+    if (this.available === true) return true;
     try {
       await import('playwright');
       this.available = true;
       this.checkError = null;
     } catch {
       this.available = false;
       this.checkError = 'Playwright is not installed. Install with: npm install playwright && npx playwright install chromium';
     }
-    return this.available;
+    return this.available ?? false;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/browser-use/bridge.ts` around lines 19 - 30, The checkAvailability method
currently sets this.available = false on import failure and never retries;
change it to only cache the positive result and avoid permanently caching
negative results by not setting this.available to false on catch (or by setting
this.available = null in the catch) and only setting this.available = true on
success; ensure this.checkError is still populated when import('playwright')
fails so subsequent calls to checkAvailability (or launch) will re-attempt the
import rather than permanently reporting unavailable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/browser-use.md`:
- Around line 7-13: The fenced code block in the architecture diagram is missing
a language hint (MD040); update the triple-backtick fence that surrounds the
diagram (the block starting with "Claude Code → codex-mcp-server →
BrowserUseBridge...") to include a language hint such as ```text so the block
becomes a properly annotated fenced code block and satisfies markdownlint.

---

Nitpick comments:
In `@src/browser-use/bridge.ts`:
- Around line 19-30: The checkAvailability method currently sets this.available
= false on import failure and never retries; change it to only cache the
positive result and avoid permanently caching negative results by not setting
this.available to false on catch (or by setting this.available = null in the
catch) and only setting this.available = true on success; ensure this.checkError
is still populated when import('playwright') fails so subsequent calls to
checkAvailability (or launch) will re-attempt the import rather than permanently
reporting unavailable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91c3e7cc-102b-460b-a23a-9e776cbf9603

📥 Commits

Reviewing files that changed from the base of the PR and between 7255d86 and 0a30c40.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (13)
  • .gitignore
  • CLAUDE.md
  • README.md
  • docs/api-reference.md
  • docs/browser-use.md
  • package.json
  • src/__tests__/index.test.ts
  • src/browser-use/bridge.ts
  • src/browser-use/definitions.ts
  • src/browser-use/handlers.ts
  • src/browser-use/types.ts
  • src/tools/handlers.ts
  • src/types.ts
✅ Files skipped from review due to trivial changes (3)
  • package.json
  • .gitignore
  • src/browser-use/handlers.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/tools/handlers.ts
  • src/tests/index.test.ts
  • src/types.ts
  • src/browser-use/types.ts

Comment thread docs/browser-use.md Outdated
- client.ts: cache in-flight promise instead of resolved module; use
  ToolExecutionError instead of bare Error; prevent browser leak on
  session creation failure; validate URLs are http(s) only; log close
  errors instead of silently swallowing
- types.ts: remove redundant .optional() before .default(); add http(s)
  URL validation; case-insensitive normalizeKey with lowercase lookup
- bridge.ts: only cache positive availability (retry import on failure);
  add TOCTOU guard on sessionId with pending set
- handlers.ts: forward progress context; emit progress notifications
  for open/screenshot/navigate actions
- docs/browser-use.md: add language hint to architecture diagram
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant