Skip to content

feat(agent-mcp-interface): bootstrap stack with UI, profiles, MCP, CDP attach, tab attribution#1333

Merged
Dani Akash (DaniAkash) merged 23 commits into
mainfrom
feat/agent-mcp-interface-bootstrap
Jun 23, 2026
Merged

feat(agent-mcp-interface): bootstrap stack with UI, profiles, MCP, CDP attach, tab attribution#1333
Dani Akash (DaniAkash) merged 23 commits into
mainfrom
feat/agent-mcp-interface-bootstrap

Conversation

@DaniAkash

Copy link
Copy Markdown
Contributor

Summary

Lands the agent-mcp-interface bootstrap stack and its companion UI as a working end-to-end loop. The cockpit is a Hono app under apps/agent-mcp-interface that mounts into apps/server at /cockpit, borrows the server's live BrowserSession, and exposes a per-agent MCP endpoint at /cockpit/mcp/:slug that proxies the real BrowserOS browser-tool catalogue with a permission gate in front. A WXT extension under apps/agent-mcp-ui consumes the same AppType via hono-rpc and renders the cockpit home, agent create/edit, MCP directory, governance, replay, and live-run surfaces. A live tab-activity feedback loop publishes "which agent is on which tab right now" through GET /cockpit/tabs/activity so the homepage runs against real data.

What lands

apps/agent-mcp-interface (cockpit Hono app):

  • Package skeleton, env reader, port constants (PROD_API_PORT=9100, COCKPIT_MOUNT_PREFIX=/cockpit, DEV_STANDALONE_PORT=9200).
  • Agent profile CRUD on <browserosDir>/mcp-interface/agents/<id>.json with slug rotation, MCP URL migration, and CLI snippet helpers.
  • Per-agent MCP server (/mcp/:slug) that imports the real BROWSER_TOOLS catalogue from @browseros/server/tools/browser/registry, runs each dispatch through a verb-mapped permission gate, then hands off to executeTool. Navigate is gated against javascript:, file:, data: URLs.
  • Site rules + permissions catalog + a check(agent, verb, domain) API used by every dispatch.
  • createCockpitRoutes mounts the whole app into apps/server/src/api/routes/index.ts at /cockpit, wires setBrowserSession, and runs migrateMcpUrls on boot.
  • Standalone main.ts keeps working for solo dev (binds on DEV_STANDALONE_PORT); since feat(agent-mcp-interface): attach to browseros browser over CDP at boot #1248 it also bootstraps its own CDP attach via bootstrapBrowserosBrowser so tools fire against the running BrowserOS browser.
  • GET /tabs/activity publishes the in-memory tab-activity registry. Every successful browser-tool call records { agentId, slug, pageId, targetId, lastToolName, lastToolAt } keyed by stable CDP target id. Status (active within 5 s, idle afterwards) is derived at read time. Closed tabs are evicted lazily on snapshot.

apps/agent-mcp-ui (WXT extension):

  • WXT setup, react-router v7, Tailwind, shadcn primitives.
  • Cockpit homepage: hero, waiting strip, running grid, recent activity. Drives off useCockpitData() which polls /cockpit/tabs/activity every 1500 ms; active records become live agent cards, idle records become recent activity rows.
  • Agents directory + new-agent wizard + edit flow. URL widgets (the mcpUrl copy + mcp add snippet) are sourced from a single modules/api/mcp-endpoint builder so the wizard, edit flow, and MCP directory render identical strings; the module carries a TODO marking the temporary cockpit-mount target.
  • Governance surfaces: permissions, site rules, grants, audit, replay, live-run with handoff banner + activity panel.
  • Onboarding flow.
  • Hono-rpc client with dev-launcher ?apiUrl= override, sessionStorage cache, and the production fallback at http://127.0.0.1:9100/cockpit.

Why mount inside apps/server today

apps/agent-mcp-interface does not yet attach to CDP on its own when running under apps/server mode, so it borrows the server's live BrowserSession by mounting. Standalone mode (bun src/main.ts) does attach its own CDP, so solo dev keeps working. When BrowserOS Chromium ships direct CDP integration for the interface package, the unmount is three commits scoped to the cockpit module's header comment in cockpit.ts: drop the route from apps/server, flip the port constants the UI client reads, drop the legacy /cockpit/mcp/<slug> arm from the slug parser once profile migration has run.

Server-side guarantees

Zero diffs in apps/server/src/tools/browser/. The interface uses the catalogue as a library and adds its own dispatch wrapper, registry, and routes without touching framework.ts, executeTool, or the ToolContext shape.

End-to-end verification

Walked through the live cockpit on this branch against a running BrowserOS browser (full transcript in the dev session that produced PR #1331):

  • initialize round-trips with protocolVersion: 2024-11-05, server identifies as BrowserOS / Claude Code.
  • tools/call tabs { action: 'list' } runs but does not record (no page arg, extractor returns null).
  • tools/call navigate { page: 1, url: 'https://example.com' } lands on the real browser and GET /tabs/activity returns a record with the right slug, agent id, target id, page id, url, title, and status: active.
  • A follow-up read { page: 1 } updates the existing record (no duplicate), advances lastToolName and lastToolAt.
  • After 5 s of no activity the record flips to status: idle.
  • A failed dispatch (navigate { page: 999 }) returns isError: true and does NOT advance the registry.
  • Closing the recorded tab evicts the record on the next snapshot read.

Test plan

  • bun --cwd apps/agent-mcp-interface test passes (120+ tests).
  • bun --cwd apps/agent-mcp-ui test passes.
  • bunx tsc --noEmit clean in both packages.
  • bunx biome ci . clean from the package root.
  • bun dev:watch boots, the cockpit serves on http://127.0.0.1:9100/cockpit, and the agent-mcp-ui extension shows agents at /newtab.html#/agents/.
  • An MCP tool call to /cockpit/mcp/<slug> with a page arg surfaces a record on /cockpit/tabs/activity and on the homepage's RunningGrid card within ~2 s.

Follow-ups (not in this PR)

  • PR 2 of the homepage arc: TakeoverRegistry + POST /cockpit/tabs/:targetId/takeover + MCP-error gate in the same dispatch wrapper.
  • PR 3 of the homepage arc: "Take over" / "Resume agent" UI affordances + WaitingStrip "User driving" section + ActivityPanel pause/resume routed.
  • Update onboarding's hardcoded claude mcp add http://127.0.0.1:9000/mcp snippet so it points at the real server-side MCP route.
  • Drop the temporary cockpit mount and flip the port constants once BrowserOS Chromium ships direct CDP integration for the interface package.

Dani Akash (DaniAkash) and others added 21 commits June 12, 2026 16:29
First slice of the new BrowserOS v2 backend, per the architecture plan.

  packages/browseros-agent/apps/agent-mcp-interface/
    package.json       hono dep only; @browseros/agent-mcp-interface
    tsconfig.json      extends monorepo root, composite + emit decls
    biome.json         extends "//", noConsole/noProcessEnv on by default
    src/
      shared/port.ts       PROD_API_PORT = 9200 (distinct from 9000/9100/9300)
      lib/logger.ts        Structured JSON to stderr, pino-shaped fields
      lib/errors.ts        HttpError; { error: string } JSON shape
      env.ts               Single chokepoint for process.env reads
      local-server-url.ts  Write-once-after-bind module singleton
      routes/system.ts     /system/health, /system/version, /system/url
      server.ts            Chained .route('/') composition; exports
                           type AppType = typeof routes for the future
                           agent-mcp-ui hono-rpc client
      main.ts              Bun.serve on 127.0.0.1:PROD_API_PORT; sets
                           localServerUrl + logs the bound URL

Wiring:
  packages/browseros-agent/tsconfig.json  references += this package
  packages/browseros-agent/.fallowrc.json entry += src/main.ts

Verified:
  bun run --filter @browseros/agent-mcp-interface typecheck  clean
  bunx biome check apps/agent-mcp-interface/                 clean
  bunx fallow check                                          no new findings
  bun src/main.ts + curl /system/{health,version,url}        ok

byok ai-sdk path and the existing apps/server / apps/agent packages
are not touched.
Pulls the agent-mcp-interface entry back out of the monorepo-wide
packages/browseros-agent/.fallowrc.json and gives the package its
own .fallowrc.json. Running `bun run fallow` from inside the
package now analyses just this package against its single entry
(src/main.ts); running fallow from the agent root no longer drags
this package in.

Also adds a `fallow` script to the package.json so the command is
reachable via `bun run --filter @browseros/agent-mcp-interface fallow`.
Second package of the BrowserOS v2 split. WXT extension that targets
the new-tab page and (eventually) the side panel + content overlays.
First slice proves the full pipe: React + shadcn (base-vega) renders,
TanStack Router resolves routes, TanStack Query + react-query-kit
hooks fetch through a typed hono-rpc client, and the runtime closes
the loop against the agent-mcp-interface server bound on the same
machine.

  packages/browseros-agent/apps/agent-mcp-ui/
    package.json         WXT, React 19, Tailwind 4, TanStack Router 1.x,
                         TanStack Query 5, react-query-kit, @base-ui/react,
                         @tabler/icons-react, @browseros/agent-mcp-interface
                         workspace dep (type-only AppType + the
                         PROD_API_PORT constant)
    tsconfig.json        extends .wxt/tsconfig.json, jsx react-jsx,
                         types chrome + bun, paths @/* -> ./*
    biome.json           extends "//", enables tailwindDirectives,
                         ignores routeTree.gen.ts / .wxt / .output / dist,
                         relaxes lint inside components/ui + ai-elements
    wxt.config.ts        @wxt-dev/module-react, manifest with
                         chrome_url_overrides.newtab + 5 permissions,
                         Vite plugins: tanstackRouter (enforce: 'pre',
                         autoCodeSplitting off for now) + tailwindcss
    components.json      shadcn base-vega style, neutral, tabler icons
    entrypoints/app/     index.html + main.tsx (QueryClientProvider +
                         RouterProvider) + styles.css (Tailwind 4
                         @theme inline tokens)
    lib/utils.ts         cn() helper (clsx + tailwind-merge)
    components/ui/       button.tsx, card.tsx, badge.tsx via shadcn CLI
    modules/api/
      client.ts          hc<AppType>(baseUrl) + lazy Proxy that re-
                         resolves the URL on each property access
      queryClient.ts     retry: 1, staleTime: 30_000
      parseResponse.ts   ApiError thrower with .status + .body
      system.hooks.ts    First react-query-kit hooks against the
                         /system/{health,version,url} endpoints
    routes/              __root.tsx + index.tsx (file-based routing)
    routeTree.gen.ts     Generated by tanstackRouter plugin
    screens/cockpit/     Minimal Cockpit.tsx that calls useSystemHealth +
                         useSystemVersion, renders shadcn Card + Badge,
                         shows the interface server's name + version on
                         success and a copy-pasteable hint on failure

Also exposes shared/port from @browseros/agent-mcp-interface as a real
runtime export so the UI can dial in to PROD_API_PORT. The server
type export stays type-only ("default": null).

Verified:
  bun run --filter @browseros/agent-mcp-interface typecheck   clean
  bun run --filter @browseros/agent-mcp-ui typecheck          clean
  bunx biome check (both packages)                            clean
  bunx wxt build (chrome-mv3 production)                      clean
  bun src/main.ts + curl /system/{health,version}             ok
The existing apps/agent extension routes with react-router v7 +
HashRouter. agent-mcp-ui was deviating from that for no real reason —
TanStack Router's typed loaders aren't exercised at the bootstrap
stage, and the plugin-order collision with WXT's module-react had
already forced autoCodeSplitting off. Matching the in-repo precedent
removes the codegen step, drops two deps, simplifies wxt.config.ts,
and shaves ~48 kB off the production bundle (384 → 336 kB).

Changes:

  package.json    -@tanstack/react-router, -@tanstack/router-plugin
                  +react-router ^7.12.0

  wxt.config.ts   Drop prependRouterPlugin helper + the enforce:'pre'
                  workaround. Plugins now: [tailwindcss()]. Module-react
                  injects @vitejs/plugin-react as before.

  routes/         Deleted. __root.tsx, index.tsx, routeTree.gen.ts gone.

  entrypoints/app/App.tsx  New. Code-based HashRouter + Routes + Route,
                           matching apps/agent's pattern.

  entrypoints/app/main.tsx Drops createRouter + RouterProvider +
                           module augmentation; renders <App /> inside
                           QueryClientProvider + StrictMode.

  biome.json      Drops the !routeTree.gen.ts ignore.

Verified:
  bun install                                              clean
  bun run --filter @browseros/agent-mcp-ui typecheck       clean
  bunx biome check                                         clean
  bunx wxt build                                           clean, 336 kB
  bun src/main.ts + curl /system/{health,version}          ok
Mirrors the existing apps/agent launch shape so `bun run dev` boots
BrowserOS with this extension installed instead of stock Chromium.

  web-ext.config.ts   defineWebExtConfig with BrowserOS binary path,
                      dev-sane chromiumArgs (--use-mock-keychain,
                      --disable-browseros-server,
                      --disable-browseros-extensions,
                      --browseros-dock-icon=dev), and a
                      worktree+package-scoped Chromium profile under
                      /tmp/browseros-dev-<worktree>-<packageHash>.
                      Profile dir distinct from apps/agent's so the
                      two dev runs never share state.

                      BROWSEROS_CDP_PORT / BROWSEROS_SERVER_PORT /
                      BROWSEROS_EXTENSION_PORT / BROWSEROS_USER_DATA_DIR
                      / BROWSEROS_BINARY env overrides supported with
                      the same names the agent extension already uses.

  .env.example        Documents the env vars; copy to .env.development
                      to enable them. Every entry is optional.

  package.json        dev → bun --env-file=.env.development wxt
                      build:dev → same, for the development-mode build

Verified:
  bun --print 'await import("./web-ext.config.ts").then(m => m.default)'
    resolves the BrowserOS binary path + the per-worktree+package profile dir
  bunx wxt prepare         clean
  bun run typecheck        clean
  bunx biome check         clean
  bunx wxt build           clean, 336 kB
* feat(agent): Remote Hermes provider — Cloudflare-managed Fly VM runtime (#1174)

* chore: bump .internal-docs submodule

* feat(agent): add Remote Hermes provider backed by Cloudflare control plane

Adds a new 'remote-hermes' provider that runs the Hermes agent in a
managed Fly VM provisioned via the Cloudflare agent-control-worker.
Per-install VM identified by browserosId; chat turns proxy through the
worker (HTTP + SSE), and the VM dispatches tool calls back to the
laptop's local BrowserOS MCP over a single WebSocket held open by
apps/server. No API key, base URL or model required at provider-add.

Agent UI
- New provider type 'remote-hermes' with Sparkles icon, surfaced first
  in the Settings template grid with the orange "Recommended" treatment.
- Add-provider flow needs only a name; backend handles credentials.
- Inline boot pill (RemoteHermesBootPill) shows live progress through
  the cold-start stages (pulling_image, booting, healthchecking).
- Delete provider triggers /remote-hermes/destroy for the last entry.

apps/server
- lib/remote-hermes/: env, HS256 JWT minting (jose), frame parser,
  partysocket-backed WS bridge with mutex/refcount/idle-close, RPC
  router dispatching browseros tool calls to 127.0.0.1:<port>/mcp,
  ProtocolEvent -> AI SDK UIMessageStream translator, and the turn
  streamer with cold-start polling (180s budget against /vm/status).
- /chat forks on provider='remote-hermes' and pipes the worker's SSE
  through createUIMessageStreamResponse - side panel sees the same
  AI SDK stream format as any other provider.
- New /remote-hermes route: POST /start, POST /destroy, GET /status.
  Lifecycle endpoints fire-and-forget; status proxies the worker.

Plus collateral lint:fix touch-ups in eval/probeAgent/managedBlock.

* fix(agent): address remote-hermes review feedback

- bridge: close+null any existing socket at the top of doOpen() so a
  ReconnectingWebSocket that opens AFTER our 5s OPEN_DEADLINE_MS rejected
  cannot fire 'open' on the bridge's behalf and start a parallel
  ping/idle-sweep loop on the wrong socket. Every event handler now
  checks `this.socket === sock` and returns early when the dispatched
  socket isn't the live one.
- turn: derive the cold-start error message and the boot-poll comment
  from COLD_START_BUDGET_MS instead of the stale literal "90 seconds"
  left over from the original 90s budget.
- frames: drop unused PONG_FRAME export. pong frames are only synthesized
  by Cloudflare's setWebSocketAutoResponse worker-side; the laptop never
  emits one.
- remote-hermes route: drop the dead `method: 'POST'` parameter from
  fireVmLifecycle; both call sites passed the literal and the function
  always POSTs.

* refactor(server): rework Remote Hermes layering to match KlavisClient pattern

Addresses code-quality feedback on the original PR. Three problems with
the previous shape:

1. Per-request env reads. loadRemoteHermesEnv() ran on every /chat fork
   and every /remote-hermes/* handler. Now AGENT_RUNNER_JWT_SECRET lives
   in INLINED_ENV (build-time inlined) and the worker URL is
   EXTERNAL_URLS.AGENT_CONTROL_WORKER. Both read once at module load,
   never re-read.

2. Mixed responsibilities in lib/remote-hermes/. The flat dump conflated
   the HTTP wire client, the WS bridge, the SSE pump, JWT minting, env
   parsing, and the route handler logic. Split following the existing
   Klavis precedent:

     lib/clients/remote-hermes/
       remote-hermes-client.ts   raw HTTP wrapper (mintJwt + fetch)
       ws-bridge.ts              persistent WS, refcount/idle/race-safe
       auth.ts, frames.ts, rpc-router.ts, event-translator.ts
       constants.ts              module-internal tunables only

     api/services/remote-hermes/
       remote-hermes-service.ts  high-level facade. owns bridge lifecycle,
                                  exposes warm/teardown/status/streamTurn,
                                  no env or fetch lives here

3. Hidden singleton + inline lifecycle logic in handlers. getBridge()
   was a module-level singleton constructed inside the /chat handler.
   Now the service is constructed once in createHttpServer() when
   INLINED_ENV.AGENT_RUNNER_JWT_SECRET is present (warn-only when absent,
   matches Klavis behaviour), threaded into ChatRouteDeps + the new
   RemoteHermesRouteDeps, and closed on Application.shutdown.

Net effect:
  /chat fork: 50 lines -> 10 lines
  /remote-hermes routes: 119 lines -> 47 lines
  lib/clients/remote-hermes: cleaner per-file responsibility
  All [remote-hermes] template logs replaced with structured fields
  `module: 'remote-hermes'`, matching the rest of the codebase.

Shared constants moved to packages/shared/src/constants/hermes.ts:
  REMOTE_HERMES_PROVIDER_TYPE, REMOTE_HERMES_AGENT_KIND,
  REMOTE_HERMES_DEFAULT_AGENT_ID.

EXTERNAL_URLS.AGENT_CONTROL_WORKER added — matches KLAVIS_PROXY shape.

No behaviour change: chat turns, boot pill, cold-start poll, WS bridge
race fixes from the prior commit, /vm/start /vm/destroy /vm/status all
preserved end-to-end.

* fix(server): strip duplicated browseros_browseros_ tool name prefix

Tool cards in the side panel rendered as "Mcp browseros browseros
suggest app connection" instead of "suggest_app_connection". Cause:

acpx normalizes the VM catalog's "<server>.<tool>" dot into an
underscore when emitting tool names. Combined with the MCP server name
we configure ("browseros") and the catalog server name (also "browseros"
since the worker fix in this branch), the wire name becomes
"browseros_browseros_suggest_app_connection".

Our existing strip only handled the double-underscore acpx prefix and
the dot-separated catalog prefix. Add a third pattern that matches
"<word>_<word>_" only when the two word groups are identical, so we
never chew the head off an unrelated tool that happens to start
"browseros_".

* refactor(remote-hermes): post-review code-quality sweep

apps/server:
- Drop the now-unused Remote Hermes event translator — the runtime
  service emits AI SDK UI Message Stream parts directly, so the laptop
  just forwards them.
- warm()/teardown() now throw on non-2xx from the worker so the route's
  .catch logs a real error instead of swallowing the failure.
- pumpEvents() dismisses the boot pill in finally — handles the edge
  where the stream ends before any non-`start` part arrives.
- /chat fork logs a real reason when remote-hermes hits a server with
  the service unconfigured.

apps/agent:
- Add REMOTE_HERMES_PROVIDER_TYPE in lib/llm-providers/types.ts (local
  mirror of @browseros/shared since the WXT bundle doesn't depend on
  the shared package) and use it in isRemoteHermesType +
  ProviderTemplatesSection. Replace the double-filter pin pattern with
  a sort + Fragment-after-Hermes layout so the order is expressed
  declaratively.

* chore(agent): trim mcp-manager dead exports and unused devDeps (#1176)

* chore(agent): trim mcp-manager dead exports and unused devDeps

Drives the browseros-agent fallow report from 17 issues down to 4
(the residual 4 are all in the remote-hermes surface added by #1174
and out of scope here).

mcp-manager:
- Slim the public barrel to the 7 symbols real consumers import
  (routes, main, the two reconcile/service test files). Drop the
  re-exports of BROWSEROS_MCP_SERVER_NAME, BROWSEROS_MCP_STDIO_SERVER_NAME,
  getMcpManager, ReconcileUrlInput, InstallAgentResult, McpAgentId,
  McpAgentRow, ReconcileResult, UninstallAgentResult — all of which
  are only used by sibling files inside mcp-manager/ itself.
- Narrow BROWSEROS_SERVER_NAMES and planFor in service.ts to
  module-private (used only inside service.ts). This also resolves
  the AgentServerPlan private-type-leak since planFor becomes private.
- Drop the dangling McpAgentIdentifier type alias (zero consumers).

UI:
- Export IntegrationsSectionProps so the exported IntegrationsSection
  no longer references a private type.
- Narrow AGENT_PRESENTATION in integrations-section.helpers.ts to
  module-private (used only by presentationFor() in the same file).

Deps:
- Remove dotenv and picocolors from devDependencies — neither is
  imported anywhere in the package (dotenv is even explicitly
  documented as not needed in the agent README).

* chore(agent): restore dotenv/picocolors, fix remaining fallow findings, gate CI

Restoring the two devDeps that were wrongly dropped (build-script tests
on CI failed: `Cannot find package 'picocolors' / 'dotenv'`). Both are
actually consumed by `scripts/build/{server,cli}.ts` and `scripts/build/log.ts`,
but those entry points are outside fallow's discovery (the package.json
script paths use parent-directory traversal that fallow skips). Listing
both in `.fallowrc.json` `ignoreDependencies` matches the existing
convention used for `pino-pretty`.

Fixes the 4 remaining fallow findings introduced by #1174 so the new CI
gate can be green from day one:
- Export `RemoteHermesBootPillProps` (purely additive — resolves the
  private-type-leak on `RemoteHermesBootPill`, and incidentally surfaces
  the embedded `RemoteHermesVmStatus` as part of its public shape).
- Export `SocketState` in `ws-bridge.ts` (purely additive — resolves the
  private-type-leak on the diagnostic-exposed `snapshot()` method).
- Annotate the orphan `PostTurnResult` with `// fallow-ignore-next-line
  unused-type` so it stays available for follow-up wiring without
  blocking CI.

Adds a `runner / Fallow` job to `.github/workflows/code-quality.yml`
parallel to Biome and Typecheck. Same shape: checkout → setup-bun →
`bun ci` → `bun fallow`. PRs that touch `packages/browseros-agent/**`
now gate on the dead-code report.

* feat(agent-mcp-ui): install AI Elements via shadcn registry

48 components from elements.ai-sdk.dev pulled in via:
  bunx shadcn@latest add @ai-elements/<each component>

Catalog covered:
  Chatbot     attachments, chain-of-thought, checkpoint, confirmation,
              context, conversation, inline-citation, message,
              model-selector, plan, prompt-input, queue, reasoning,
              shimmer, sources, suggestion, task, tool
  Code        agent, artifact, code-block, commit,
              environment-variables, file-tree, jsx-preview,
              package-info, sandbox, schema-display, snippet,
              stack-trace, terminal, test-results, web-preview
  Voice       audio-player, mic-selector, persona, speech-input,
              transcription, voice-selector
  Workflow    canvas, connection, controls, edge, node, panel,
              toolbar
  Utilities   image, open-in-chat

Files land in components/ai-elements/ and consume shadcn primitives
from components/ui/, which the same install grew from 3 to 25 to
cover all the dependencies (accordion, command, dialog, dropdown,
hover-card, popover, scroll-area, select, tabs, tooltip, etc.).

Peer deps brought in (devDependencies left untouched; runtime only):
  ai, streamdown, @streamdown/{cjk,code,math,mermaid}, shiki,
  @xyflow/react, motion, @rive-app/react-webgl2, media-chrome,
  cmdk, embla-carousel-react, lucide-react, nanoid, tokenlens,
  use-stick-to-bottom, react-jsx-parser, ansi-to-react,
  @radix-ui/react-use-controllable-state.

@base-ui/react bumped to ^1.5.0 because the components target a
newer API surface than the previous ^1.0.0-beta.6 exposed.

Six ai-elements files (attachments, context, inline-citation, plan,
prompt-input, voice-selector) ship with `closeDelay`/`openDelay`/
event-handler shapes that don't typecheck against @base-ui/react
1.5.0. Bundled JS runs fine (unknown props get spread onto DOM and
silently ignored, event handler arities are forgiving at runtime),
only tsc strictness flags them. Marked with `// @ts-nocheck` at
the top of each, with a comment explaining the posture.

biome.json now disables both formatter and linter for
components/ui/** and components/ai-elements/** (third-party
drop-ins; both shadcn-installed quote style and the AI Elements
internal patterns differ from the repo style). organizeImports
assist action also off on the same paths.

Verified:
  bun run typecheck   clean
  bunx biome check    clean (89 files)
  bunx wxt build      clean, chrome-mv3 397 kB

* feat(agent-mcp-ui): cockpit shell + sidebar + 4 routes

Foundation pass on top of the WXT bootstrap. Sidebar with hover-expand
behaviour matching apps/agent's idiom, four routed surfaces (cockpit,
agents, governance, mcp), plus a stub /agents/new for the future
wizard. The cockpit page is the full dashboard design from the
prototype, hooked up to mock react-query-kit hooks shaped so the
eventual swap to real agent-mcp-interface routes is a fetcher body
change.

Layout
  entrypoints/app/App.tsx              HashRouter with single layout route
                                       (CockpitShell) wrapping 5 children
  entrypoints/app/main.tsx             TooltipProvider added, @fontsource
                                       imports for Schibsted Grotesk +
                                       Newsreader italic + JetBrains Mono
  components/layout/CockpitShell.tsx   fixed sidebar (w-14 collapsed,
                                       w-64 expanded, 150ms collapse
                                       delay) + main outlet, pl-14 offset
  components/layout/PlaceholderScreen  shared "coming soon" composite

Sidebar
  components/sidebar/AppSidebar.tsx        branding + navigation, no
                                           user footer until we have a
                                           setting to surface
  components/sidebar/SidebarBranding.tsx   orange B mark + wordmark
  components/sidebar/SidebarNavigation.tsx 4 lucide-iconed NavLinks
                                           with base-ui Tooltip (render
                                           prop, not asChild) on
                                           collapsed

Cockpit surfaces
  components/cockpit/CockpitHero.tsx     hero with serif italic accent
  components/cockpit/WaitingStrip.tsx    container for approvals +
                                         handoffs
  components/cockpit/ApprovalBanner.tsx  3-button approval card
  components/cockpit/HandoffRow.tsx      amber "take over" row
  components/cockpit/RunningGrid.tsx     auto-fill grid + live count
                                         chip + AddAgentTile at the end
  components/cockpit/RunningCard.tsx     mini-screencast + label +
                                         status + task + watch/stop
  components/cockpit/AddAgentTile.tsx    dashed-border "+ New profile"
                                         tile linking to /agents/new
  components/cockpit/RecentActivity.tsx  list container with flagged-
                                         count chip
  components/cockpit/ActivityRow.tsx     per-row status icon + agent
                                         dot + jump-to action
  components/cockpit/StatusBadge.tsx     token-driven status pill
  components/cockpit/MiniScreencast.tsx  placeholder card-top tile

Placeholder screens
  screens/cockpit/Cockpit.tsx            rewrite: composes the surfaces
                                         above against mock hooks
  screens/agents/Agents.tsx              placeholder
  screens/governance/Governance.tsx      placeholder
  screens/mcp/Mcp.tsx                    placeholder
  screens/new-agent/NewAgent.tsx         placeholder

Data
  modules/api/agents.hooks.ts      useAgents (mock; 3 running rows)
  modules/api/waiting.hooks.ts     useApprovals + useHandoffs (1 + 1)
  modules/api/activity.hooks.ts    useRecentActivity (4 rows: blocked,
                                   needs-human, allowed, done)
  lib/status.ts                    RunStatus union + STATUS_META map +
                                   isActiveStatus / isEndedStatus
                                   helpers; single source of truth so
                                   colors stay consistent across the
                                   cockpit, audit, and activity log

Design tokens
  entrypoints/app/styles.css       full BrowserOS warm-cream palette
                                   wired via @theme inline (shadcn
                                   primitives re-pointed at BrowserOS
                                   surfaces; bespoke ink scale, status
                                   palette, accent ink, shadows,
                                   pulse-dot / fade keyframes). Body
                                   carries the design's layered radial
                                   gradient. Selection uses accent
                                   tint instead of chrome blue.

Deps
  @fontsource-variable/schibsted-grotesk
  @fontsource-variable/jetbrains-mono
  @fontsource/newsreader (400-italic + 500-italic only)

Verified:
  bun run --filter @browseros/agent-mcp-ui typecheck       clean
  bunx biome check                                         clean (113 files)
  bunx wxt build                                           clean (880 kB:
                                                          317 kB JS,
                                                          80 kB CSS,
                                                          ~480 kB fonts)

* fix(agent-mcp-ui): declare the browserOS permission

Without `browserOS` in the manifest's permissions array, BrowserOS
Chromium's new-tab override gate refuses the extension's claim and
the cockpit never replaces chrome://newtab. The apps/agent extension
sits on the same permission for the same reason.

Adds two near-neighbours that the cockpit will reach for soon:
  webNavigation  for routing the future live-run jump targets
  (the rest were already declared)

* fix(agent-mcp-ui): switch to WXT's conventional newtab entrypoint

Renames entrypoints/app/ to entrypoints/newtab/ so WXT auto-wires
manifest.chrome_url_overrides.newtab against the generated
newtab.html. Drops the hand-rolled chrome_url_overrides block from
wxt.config.ts since WXT now manages it.

The output file is newtab.html instead of app.html; nothing
internal references the page by filename so no other code change
is needed. Build output verified: manifest carries
{ chrome_url_overrides: { newtab: 'newtab.html' } } automatically.

Reference: https://wxt.dev/guide/essentials/entrypoints.html#newtab

* chore(agent-mcp-ui): wire react-doctor into lint

Lint now runs biome and react-doctor concurrently as a single pass via concurrently --group, so findings from both tools land in one terminal output and the combined exit code is non-zero if either fails. react-doctor is invoked through bunx (no devDep yet) so the bun release-age gate doesn't block its newest versions. Config in doctor.config.json mirrors biome's vendored-path ignores (components/ui, components/ai-elements, build outputs).

* chore(agent-mcp-ui): add .gitignore and untrack .wxt artifacts

Mirrors apps/agent's .gitignore. .wxt/ is regenerated on every wxt dev/build, so the seven previously-tracked files in it churned the diff for no reason.

* chore: added verbose flag

* chore(agent-mcp-ui): split lint and react-doctor into separate scripts

Running them together wasn't worth the extra plumbing. lint stays biome-only; react-doctor moves to a dedicated lint:doctor script with --verbose. Drops concurrently from devDeps.

* feat(agent-mcp-ui): new-agent wizard at /agents/new

Replaces the placeholder with a 4-section wizard (harness, logins, tool approvals, ACL rules) plus a sticky preview rail showing the MCP URL and an Add-to-harness CTA. Form wires through react-hook-form with a zod schema and shadcn's Form primitive; the submit fires useCreateAgent (react-query-kit mutation, mocked latency) and flips the rail into an added state with a Done button that returns to /agents.

Adds shadcn form/label/radio-group/toggle/toggle-group primitives. form.tsx is hand-written for the base-vega Label since the registry copy depends on @radix-ui/react-label which the project doesn't ship. Approvals row uses ToggleGroup with single-select; ACL rows split into a toggle button + sibling trash button to clear the nested-interactive lint warning.

doctor.config.json sets deadCode:false because react-doctor's unused-file rule cannot trace WXT's entry resolution and was flagging every shipped component.

* feat(agent-mcp-ui): governance hub + audit tab

Replaces the placeholder at /governance with a 4-tab hub (Audit, Permissions, Site Rules, Grants) backed by nested routes. The shell renders a sticky header with a pulse-dot live-run counter and a shadcn Tabs nav whose triggers navigate per-tab URLs; the matched sub-route renders inside the Outlet.

Audit tab ships: filter chips (All/Running/Blocked/Completed) via ToggleGroup, run-count line, and a list of AuditRow cards (status icon + agent/harness + status pill + meta) that link out to /governance/audit/:runId/replay when clicked. Runs come from a new useRuns mock that mirrors the eventual hono-rpc shape.

Permissions, Site Rules, and Grants render a small ComingSoonTab stub so the tab nav feels complete and the URL space is reserved.

* fix(agent-mcp-ui): tab + chip active states

Base-ui Toggle emits aria-pressed, not data-state=on / data-pressed=true, so the governance filter chips and the new-agent approval toggles weren't actually flipping color when selected. Swapping the selectors to aria-pressed: ties the visual state to the primitive's real attribute.

Active governance tab now shows an accent-orange underline. The base shadcn tabs.tsx applies after:bg-foreground and after:bottom-[-5px] in its own className, and Tailwind utility cascade ordering meant my override classes were being beaten. Adding ! to after:bg-accent and after:bottom-[-1px] wins the cascade and lands the underline right on the TabsList's bottom border.

* fix(agent-mcp-ui): tabs primitive matches base-ui's data-orientation

Base-ui's Tabs root emits data-orientation=horizontal but the shadcn base-vega tabs.tsx was selecting on data-horizontal (no attribute value), which never matched. With the gate failing, the active-tab underline's after:inset-x-0 and after:h-0.5 were dropped, leaving the pseudo-element at 0x0 and invisible. Side-by-side with the prototype made the gap obvious: same accent color and bottom offset, but no underline rendered. Swapping the four group-data-horizontal/group-data-vertical selectors to group-data-[orientation=horizontal] / group-data-[orientation=vertical] lets them match what base-ui actually emits; the active trigger now paints the 2px accent underline at the TabsList border, matching the prototype.

* feat(agent-mcp-ui): live-run view with approval and handoff overlays

New full-bleed route /run/:runId sits outside the CockpitShell wrapper and gives every agent run its own watch view: a stubbed browser viewport on the left with fake chrome, a centred site host placeholder, the persistent agent-driving badge, and a working pill spelling out the live action; a docked activity panel on the right with the action log, a pinned approval card when the run needs an OK, a pinned handoff notice when it needs the user, an inline block notice when Site Rules killed an action, plus elapsed/tokens/steps stats and pause/stop controls.

Approval card honours the v1 UX spec's three-button shape (Allow once, Always allow on domain, Block) and the scope sentence that pins the permission to the current domain. Handoff banner is a full overlay over the viewport with the amber top strip, dimmed page, and an I'm-not-a-robot challenge stub standing in for the real site's CAPTCHA/2FA; the matching in-panel HandoffNotice means the user can resume from either surface. Local state dismisses approvals, handoff, and block notices since the backend isn't wired yet.

Cockpit RunningCard 'Watch' button now navigates to /run/<agentId>; run fixtures key by agent id so the cockpit-to-live flow lands on real data for the three running agents. Mock useRun mirrors the eventual hono-rpc /runs/:id + SSE shape.

* ci: run code-quality on PRs targeting feat/agent-mcp-interface-bootstrap

Stacked PRs on the agent-mcp-interface bootstrap branch were skipping biome / typecheck / fallow because the workflow's pull_request filter only matched main and dev. Adding the explicit branch lets the quality gate fire on every stacked PR before it lands on the parent.

---------

Co-authored-by: shivammittal274 <56757235+shivammittal274@users.noreply.github.com>
* feat(agent-mcp-ui): replay view at /governance/audit/:runId/replay

Full-bleed replay player that sits outside CockpitShell so the recorded run gets the whole viewport. The top bar shows the task title, agent and harness, status pill, and a stat strip (duration, tokens, steps, approvals). The body splits into a reconstructed browser viewport on the left (fake chrome + site host placeholder + caption pill that tracks the playhead), a transport with play/pause/restart + native range-input scrubber (overlaid with accent track, kind-coloured bookmark dots for approval/block/done frames, and an accent thumb) + 1x/2x/4x speed toggle, and a right rail Action Timeline whose rows highlight the current frame, dim future frames, and click-to-seek.

Playback wallclock lives in usePlayback hook (the project's one allowed useEffect case: starting and cancelling setInterval tied to play state). Scrubber is a real <input type="range"> styled transparently over the visual track so we get native click-to-position, keyboard arrows, Home/End, and screen-reader semantics for free; bookmark buttons sit at z-10 above the track and below the input, so direct clicks still seek to their frame. Mock useReplay keyed by run id mirrors the eventual /runs/:id/replay shape.

* feat(agent-mcp-ui): agents directory at /agents with revoke flow

Replaces the /agents placeholder with a real directory of configured agent profiles. Header shows the configured-count pill and a primary Add agent CTA that lands the user on the existing /agents/new wizard; the body renders one row per profile with the harness icon chip, name + harness, scope summary (logins, ACL rules, blocked actions, always-allow grants), last-run timestamp, status badge (Configured / Paused / Disabled), and Edit + Revoke buttons. Empty state renders a dashed coming-soon-style card with its own Add CTA.

Revoke runs through shadcn AlertDialog (not window.confirm) so focus trapping and ARIA semantics ship for free. useDeleteAgent's onSuccess writes back to the agent-profiles cache via setQueryData so the row vanishes immediately without a refetch, per the project's no-parallel-state-over-cache rule. Edit currently navigates to /agents/:id/edit which is a placeholder slot until the new-agent wizard grows an edit mode.

Adds shadcn alert-dialog primitive. Mock useAgentProfiles returns seven profiles spanning every status.

* fix(agent-mcp-ui): honour prefers-reduced-motion + snapshot CockpitShell ref

Adds a global prefers-reduced-motion: reduce media block in styles.css that collapses every animation and transition to a 0.01ms no-op for users with vestibular sensitivities, satisfying WCAG 2.3.3 across the cockpit (pulse-dot live indicators, sidebar expand, replay scrubber transition, in-app fade-ups).

Refactors CockpitShell's unmount cleanup to snapshot the timeout ref object into a stable local before closing over it in the cleanup, which is the React docs' canonical pattern for refs in effects and resolves the missing-effect-dependencies warning. Behaviour is unchanged: the cleanup still clears whatever timeout id is current at unmount time.

react-doctor score moves from 74/100 with 2 findings to 100/100 with 0 findings.

* feat(agent-mcp-ui): mcp registry at /mcp

Replaces the /mcp placeholder with the per-agent MCP endpoint registry. Reads every configured profile from useAgentProfiles and renders one card per profile: harness icon chip + name + harness, slug + CLI hint, status pill, the dark URL block with a copy button, and a Regenerate URL + Add to {harness} button pair. The Add CTA flips into a brief Added confirmation; the copy button flips into a check icon for 1.5s so the user knows the clipboard write landed.

Adds useRegenerateMcpUrl mock mutation that rotates the slug and writes the new URL straight back into the agent-profiles cache via setQueryData, so the row reflects the new endpoint without a refetch. Shape mirrors the eventual hono-rpc surface. Skip the per-harness /mcp/setup-* helper screens for now per the running plan; we land them with the onboarding work.

* fix(agent-mcp-ui): keep selected text readable on dark surfaces

Global ::selection only set a background, so on light-on-dark surfaces like the MCP URL block the cream text disappeared into the accent-tint selection highlight. Pinning the foreground to ink keeps every selection readable: dark ink on light tint everywhere, including the live-run viewport caption and the MCP code block.

* chore(browseros-agent): bump biome to 2.5.0

2.5.0 just cleared the bunfig release-age gate so the local install matches what CI's version: latest has already been pulling. Updates the package.json pin plus the schema references in the root biome.json and apps/agent-mcp-ui/biome.json. apps/agent-mcp-ui stays clean under both bun run lint and biome ci. The pre-existing diagnostics surfacing on apps/eval, apps/server, apps/agent, packages/shared, and scripts/dev are unchanged from before the bump.
…ants + edit wizard (#1223)

* feat(agent-mcp-ui): first-launch onboarding flow at /onboarding

Adds a four-step onboarding flow sitting full-bleed outside the CockpitShell. Left brand column carries the BrowserOS logo, a Newsreader-italic pull quote, and three value props (fast & token-cheap / logged in as you / under your control). Right column shows step dots up top and one of four step panels: Welcome (set up vs reconnect), Import Logins (Chrome-quit gate, profile picker with default Work + Personal selected, Keychain notice, progress card, summary), Connect to Claude (one-click add or copyable CLI fallback, success card), Ready (two starter prompts with copy buttons, Open BrowserOS CTA). Reconnect and Open BrowserOS both navigate to /.

Adds useImportChromeSessions and useConnectToClaude mock mutations (react-query-kit createMutation) whose shape matches the eventual hono-rpc surfaces. CHROME_PROFILES seeds three profiles totalling 55 sites and 14 logins; STARTER_PROMPTS reuses the prompt strings already surfaced elsewhere in the cockpit. Skip first-launch gating for now: per the running plan's open questions, the where-does-the-flag-live decision lives with the backend SSE work.

* feat(agent-mcp-ui): governance permissions, site rules, grants tabs

Rounds out the governance hub. The three placeholder tabs now ship real surfaces:

Permissions: read-only catalog of the six action categories grouped into the three buckets (Auto / Ask / Block) every new agent inherits from. Read straight from new-agent.schemas' APPROVAL_CATEGORIES so the wizard's default verdicts and the catalog can't drift. Layout is a three-column lg grid with verdict-coloured bucket cards.

Site Rules: list of (label, domain, action) blocks the browser enforces directly. Each row carries a coloured action badge, the domain in mono, and a delete button. An inline 'Add a rule' form expands into a react-hook-form + zod editor with three fields (label, domain, action select) and submits through useAddSiteRule. setQueryData writes on both add and delete keep the list as the cache's source of truth.

Grants: the always-allow ledger. Per-row action + domain + grantee + when + optional note + Revoke button. Revoke routes through a shadcn AlertDialog explaining the consequence (future attempts re-prompt, existing runs unaffected). useRevokeGrant's onSuccess drops the row from the cache.

ComingSoonTab is removed; nothing imports it any more.

* feat(agent-mcp-ui): edit-mode wizard at /agents/:id/edit + recent-activity replay link

Closes the two loose ends called out in the running plan.

The new-agent wizard now accepts an optional mode prop ('create' | 'edit'); /agents/:id/edit renders it with mode=edit. In edit mode the data hook reads the agent id from useParams, fetches the wizard-shape values via useAgentProfileDetail (a mock that synthesises full NewAgentValues from an AgentProfile summary), drives the form's reactive values prop, and routes submit through useUpdateAgent. The mutation's onSuccess patches the agent-profiles cache so the directory's row reflects the rename immediately. Header copy, submit CTA, pending label, and success card all flip to edit-mode strings ('Edit agent', 'Save changes to X', 'Saving…', 'X updated'); the copy-from-existing card is hidden in edit mode.

Cockpit recent-activity now lands done rows on the replay route. Added an optional runId to the ActivityRow type and a new History-icon Replay button on done rows that links to /governance/audit/:runId/replay. The Codex . Log calls done row points at run-concur-may so the demo lands on real fixture data.

* fix(agent-mcp-ui): keep AddSiteRuleForm mounted until mutation settles

Previously the form ran close() synchronously after onSubmit, before addRule.mutate could resolve. Today the mock always succeeds; once a real backend lands, a 4xx would silently drop the row and lose the user's input. Widening the onSubmit prop to forward react-query-kit's mutation options lets the parent hand close to the mutation's onSuccess, so the form stays mounted on failure and a FormMessage can surface the error once we wire one in.
# Conflicts:
#	packages/browseros-agent/apps/agent/modules/chat/chat-session.hooks.ts
#	packages/browseros-agent/apps/agent/screens/ai-settings/NewProviderDialog.tsx
#	packages/browseros-agent/apps/agent/screens/ai-settings/ProviderTemplatesSection.tsx
#	packages/browseros-agent/bun.lock
…1224)

* chore(agent-mcp-interface): foundation for phase 1

Adds the storage helper Phase 1 leans on, plus the deps and env reads that helper + the upcoming agent routes need.

env.ts now also exposes BROWSEROS_DIR overrides and an isDevelopment flag (still the only sanctioned process.env reader). src/lib/browseros-dir.ts resolves <homedir>/.browseros (or .browseros-dev under NODE_ENV=development) with the env override winning; the package writes everything under <browserosDir>/mcp-interface/. src/lib/storage.ts wraps readJson/writeJson/listFiles/removeFile/ensureDir/fileExists around the interface root, validates every read and write with a supplied zod schema, refuses absolute paths or .. escapes, and writes through a <name>.tmp rename so a mid-write crash leaves either prior contents or nothing.

Adds bun test wiring + tests/_helpers/temp-browseros-dir.ts so every test gets an isolated tmp root. 13 storage tests pass.

* feat(agent-mcp-interface): agent profile schemas + service

schemas.ts is the wire contract the UI's typed client picks up via AppType. Mirrors the existing UI wizard shape (NewAgentValues) and adds the storage shape (server-managed id / slug / mcpUrl / status / timestamps) plus the directory projection used by GET / responses.

service.ts wraps the storage helper with file-backed CRUD: one profile per file at <browserosDir>/mcp-interface/agents/<id>.json keyed by nanoid(8). Slug is the user-facing identifier and is uniqued across all profiles via uniqueSlug (which collides up to -99 before throwing). mcpUrl is recomputed from getLocalServerUrl on every read so a port change between boots doesn't strand the stored value. lib/slug.ts mirrors the UI's toSlug so wizard preview and persisted slug match.

15 service tests cover create / list / detail / update (rename + slug rotation, slug stability when name unchanged) / remove / regenerate / parallel updates. Plus the original 13 storage tests still pass. 28 tests total in 354ms.

* feat(agent-mcp-interface): /agents CRUD routes

Thin Hono layer over routes/agents/service.ts: zValidator rejects malformed bodies with structured 400s, missing-id paths surface 404 via HttpError, the rest just translate HTTP shape. Chained into server.ts via .route('/', agentsRoute) so AppType automatically picks up POST /agents, GET /agents, GET /agents/:id, PATCH /agents/:id, DELETE /agents/:id, POST /agents/:id/mcp-url:regenerate.

Five route-level integration tests drive the typed client (hc<AppType>) against app.fetch with no real port bind, in an isolated tmp <browserosDir> per case. Covers the full lifecycle, every 404 path, the 400 zod path, slug collision through the route, and parallel updates of two profiles. The regenerate slug regex was tightened to allow nanoid-suffixed multi-hyphen slugs (toSlug normalises any _ in the nanoid output to -).

33 tests / 88 expect calls pass in under 100ms.

* feat(agent-mcp-ui): swap six agent hooks for real client calls

Replaces the in-memory mocks for useAgentProfiles, useAgentProfileDetail, useCreateAgent, useUpdateAgent, useDeleteAgent, and useRegenerateMcpUrl with hono-rpc calls through the existing client + parseResponse pair. Strips the MOCK_AGENT_PROFILES fixture, the profileToWizardValues synthesiser, the buildMcpUrl/toSlug/nanoid mock helpers, and the artificial setTimeout latencies — the cache surface seen by every consumer (Agents directory, new-agent wizard create + edit, MCP registry regenerate / delete dialogs) stays byte-identical because the wire types now flow from AppType and match what the UI already expected.

useAgents (cockpit running grid) stays on its three-row MOCK_AGENTS fixture; that hook becomes Phase 4's projection over the runs store, called out in a top-of-file comment.

UI typecheck, lint, lint:doctor all clean. react-doctor holds at 100/100.

* fix(agent-mcp-interface): harden phase 1 against the three greptile findings

Three independent fixes; reviewer suggestions on PR #1224 covered them all.

1. Storage path guard now inspects the raw input for '..' segments before normalize collapses them. 'agents/../config.json' previously normalized to 'config.json' and slipped past the rooted-prefix check, which would have let any future route forwarding a path-shaped id read or delete files at the mcp-interface/ root. Storage tests cover read/write/remove on a lateral-traversal path.

2. Service layer validates the id shape (matches the nanoid alphabet, length-capped) inside loadById and remove. Traversal-shaped ids on any read/write/delete path now resolve as not-found rather than reaching the storage layer. Service test exercises four evil ids across all four entry points.

3. loadAll uses Promise.allSettled + logger.warn instead of Promise.all so a single corrupt agent json (manual edit, partial migration, half-written file on a weird FS) gets logged + skipped rather than rejecting the whole call. Without this, one bad file would brick list and create until the user manually deleted it. Test writes a garbage file alongside a valid one and confirms list returns only the valid one + create still works.

4. AsyncMutex serialises create / update / regenerateMcpUrl so the read-snapshot → uniqueSlug → write window cannot race against itself. Closes the TOCTOU window where two concurrent same-name creates could both pass the uniqueness check and write the same slug. Reads stay lock-free. Mutex has its own unit tests (FIFO ordering, rejection doesn't block subsequent tasks). Service test fires 10 parallel creates with the same name and asserts 10 distinct slugs come back (race, race-2, ..., race-10).

39 tests / 125 expect calls pass. Lint + typecheck clean.
…pi (#1231)

* feat(agent-mcp-interface): add domain glob matcher + approval catalog seed

* feat(agent-mcp-interface): file-backed site-rules service

* feat(agent-mcp-interface): wire /site-rules and /permissions/catalog routes

* feat(agent-mcp-interface): permissions.check api for executor pre-flight

* feat(agent-mcp-ui): swap site-rules + permissions catalog hooks to real client

* fix(agent-mcp-interface): enforce admin site rules + warn on catalog fallback
* feat(agent-mcp-interface): browser executor interface + deterministic stub

* feat(agent-mcp-interface): wire /mcp/:slug via MCP SDK web-standard transport

* feat(agent-mcp-interface): permission gate + navigate tool through MCP

* feat(agent-mcp-interface): add read, click, type, attach, submit tools

* test(agent-mcp-interface): pin delete-agent-slug-404s-immediately invariant

* fix(agent-mcp-interface): plug stub leak + reject non-http navigate + attach traversal
…r wiring (#1234)

* fix(agent-mcp-ui): clone-from card uses real profiles and hydrates every field

* fix(agent-mcp-ui): hide logins step until vault import lands

* fix(agent-mcp-ui): pin new-agent rail CTA to viewport bottom

* feat(agent-mcp-interface): wire agent-mcp-manager into create + delete

* feat(agent-mcp-ui): surface real harness install outcome on the success card

* feat(agent-mcp-ui): cover all agent-mcp-manager harnesses + shared HarnessIcon

* feat(agent-mcp-ui): real brand marks via @svgl shadcn registry + drop BrowserOS pill

* fix(agent-mcp-interface): reconcile harness link on update + regenerate

* fix(agent-mcp-ui): invalidate profile caches after create/update/delete/regenerate

* fix: handle clone-fetch failure + remove-before-uninstall on delete
…ps/server (#1235)

* feat(server): export browser tool surface + session for cockpit reuse

* feat(agent-mcp-interface): adopt @browseros/server real tool catalogue with permission wrapper

* feat(server): mount cockpit inside apps/server runtime with mcpUrl migration

* test(agent-mcp-interface): pin migrateMcpUrls rewrite + re-install behavior

* fix(server,eval): cast around workspace zod version cross-pollination

* fix(cockpit): isolate uninstall + catch migration; reject non-http navigate; log run dispatch

- migrate-mcp-urls: wrap uninstallForAgent in its own try/catch so a
  throw there does not skip installForAgent and leave the harness
  pointing at a dead URL while the profile JSON carries the new one.
- cockpit.ts: add .catch on migrateMcpUrls so a top-level rejection
  (e.g. listFiles hitting EACCES) is logged instead of swallowed as an
  unhandled promise rejection.
- mcp/register: reject javascript:, file:, and data: URLs at the
  navigate wrapper before the permission gate, restoring the
  defense-in-depth the old per-tool wrapper had. The real navigate
  tool's schema is z.string().optional() with no scheme check.
- mcp/register: log a warning when the run tool dispatches. A dedicated
  catalog verb for arbitrary script execution is the proper fix; the
  log keeps dispatches auditable until that lands.
- Integration test: lock the navigate scheme guard with explicit cases
  for javascript:, file:, and data:.

* feat(cockpit-ui): confirm before rotating MCP URL; drop redundant Add-to-harness button

The MCP page had two paper cuts:

1. Regenerate URL fired straight on click. Rotating destroys the
   previously-issued URL and re-installs the harness entry under a new
   slug, so anywhere the old URL was pasted by hand stops working.
   Now the button opens a shadcn AlertDialog explaining the impact
   (auto-reinstall via reconcileHarnessLink, but external paste-ins go
   dead) and only fires the mutation on confirm. Matches the pattern
   used by DeleteAgentDialog.

2. The "Add to <harness>" button only flipped a local "Added" badge
   for 1.8s; it never triggered an install because the install already
   ran when the agent was created. Removed the button and updated the
   header + empty-state copy to say so explicitly.

* fix(cockpit): mount standalone server under /cockpit prefix

The merged-runtime refactor switched the UI client and the harness
install URLs to a single shape: `http://127.0.0.1:<port>/cockpit/...`.
That works against `createCockpitRoutes` because apps/server mounts
the cockpit under `.route('/cockpit', ...)`. The standalone entry
point in `src/main.ts` was still serving at the root, so every UI
request returned 404.

Wraps the Hono `server` in a parent that mounts it under
COCKPIT_MOUNT_PREFIX and updates `localServerUrl` to include the
prefix. The buildMcpUrl helper now produces the same shape in both
runtimes, so harness configs stay valid across a switch.

Also runs `migrateMcpUrls` at standalone boot — same sweep the
production factory does — so profiles created before this change
get their stored mcpUrl + harness install entries rewritten to the
new shape on first start.

* fix(cockpit-ui): reconcile agent-profiles list after regenerate URL

GET /agents is sorted by `updatedAt` DESC server-side, and regenerate
bumps that field — so the rotated row needs to jump to the top of
the directory and the MCP page. The previous handler only patched
`mcpUrl` in place, leaving the sort order stale until the next page
load.

Swaps the detail-cache invalidation (the GET /agents/:id wire shape
doesn't carry slug or mcpUrl, so the invalidation was a no-op) for
a list-cache invalidation, while keeping the optimistic `mcpUrl`
patch so the new URL still appears without a network round-trip.
…ot (#1248)

* feat(agent-mcp-interface): attach to browseros browser over cdp at boot

The standalone cockpit ran the route surface but never set the
process-wide BrowserSession, so every MCP tools/call short-circuited
with "browser session not connected". Production worked because the
merged runtime in @browseros/server called createCockpitRoutes with
its live session; standalone had no such hand-off.

Mirrors @browseros/server's bootstrap directly: connect a CdpBackend
to the configured port, wrap in Browser, hand the session to
setBrowserSession at boot. Configurable via the BROWSEROS_COCKPIT_CDP_PORT
env var, defaults to 49337 (IANA dynamic / private range, no known
collision with registered services).

Soft-fails when the browser is not reachable. The cockpit still
serves the UI, profile CRUD, harness installs, and tools/list; only
tools/call keeps the existing "session not connected" wire shape
until the user restarts the cockpit with the browser up.
exitOnReconnectFailure: false on the CdpBackend so a transient drop
degrades the session instead of killing the cockpit process.

CdpClient is injected through BrowserBootstrapDeps so the unit test
covers the connect-success, connect-fail, and disconnect-swallows-errors
paths without opening a socket.

* fix(cockpit): guard signal handler + harden DI shape

Three small cleanups from the PR review:

- Add an `exiting` guard around the SIGINT/SIGTERM cleanup so a
  back-to-back delivery (supervisor sends both) does not restart
  `disconnect()` on an already-closing CDP connection.
- Add a `setTimeout(() => process.exit(1), 5000).unref()` kill switch
  before `disconnect()` so a hung inner `cdp.disconnect()` (half-open
  socket, network stall) cannot leave the process unkillable except
  via SIGKILL.
- Replace `BrowserBootstrapDeps`'s two independent override fields
  with a single bundled `inject` object so callers cannot mix a stub
  `cdpFactory` with the default `buildSession`. The default
  `buildSession` casts to a real `CdpBackend`, so a partial override
  would compile but blow up at the first `Browser` call. Tests
  updated accordingly.
Brings the bootstrap stack up to date with 72 commits on main since
the previous sync. Conflicts and follow-up:

- apps/server/src/api/server.ts: main refactored every route into
  createApiRoutes (apps/server/src/api/routes/index.ts) and a new
  KlavisService. Took main's server.ts verbatim and spliced the
  `/cockpit` mount into createApiRoutes next to `/mcp` and
  `/mcp-manager`, where it now lives alongside the other API routes.
- Browser tool catalogue grew from 10 to 16. New tools: download,
  evaluate, pdf, tab_groups, upload, windows. Cockpit TOOL_TO_VERB
  extended:
    upload -> upload (catalog already has the verb)
    windows, tab_groups -> navigate (mutate site context)
    pdf, download, evaluate -> input (no dedicated verb yet)
  `evaluate` joins `run` in the arbitrary-script audit log path
  (same risk class; same fix needed once the catalog grows a verb).
- Integration test catalogue expectation updated to the 16-tool
  surface so `tools/list` stays green.
- Auto-merge handled the eval cast comments, the server package.json
  exports map, code-quality workflow, and bun.lock cleanly.
…gent create/edit (#1328)

The wizard, the edit flow, and the MCP directory each constructed the
per-agent MCP URL inline. The wizard's pre-save preview returned
http://127.0.0.1:9000/mcp/<slug>, which is wrong on both axes: port
9000 is the CDP socket, and the cockpit lives under /cockpit while it
borrows apps/server's BrowserSession.

Introduce modules/api/mcp-endpoint as the canonical source. The
copy widget on /agents/new and /agents/:id/edit, the McpRow in the
directory, and the slug parser all flow through it. A TODO at the top
of the module marks the temporary cockpit-mount target and lists the
three commits the future unmount will need.
#1331)

* feat(agent-mcp-interface): tab activity registry + GET /tabs/activity route

Wraps the existing executeTool dispatch in mcp/register.ts so every
successful browser-tool call is recorded against the calling agent
and the targeted CDP target id. Failed dispatches and tools without
a page arg (tab_groups, windows, run) are skipped.

Records live in an in-memory map keyed by targetId; status is
derived at read time (active for 5s after the last tool, idle
afterwards). Closed tabs are evicted lazily when the next snapshot
read finds the pageId no longer maps to the original targetId
(pageIds are reused after close).

The GET /tabs/activity route surfaces the current snapshot and is
mounted into the AppType chain so the UI hono-rpc client picks it up
automatically. No server-package edits anywhere; the cockpit reads
PageManager via the shared BrowserSession singleton it already owns.

* feat(agent-mcp-ui): drive cockpit homepage from real tab-activity polling

useTabsActivity polls GET /cockpit/tabs/activity every 1500ms via the
existing hono-rpc client; cockpit.data composes that with the existing
mocked approvals/handoffs so the screen calls one hook only. Active
records become RunningGrid cards (status=running); idle records become
RecentActivity rows (status=done) with a relative-time string. Helper
file derives a stable per-slug color, parses the site from the URL,
and formats the relative timestamp.

Mocked useAgents / useRecentActivity stay in place for any other
surface that imports them; the homepage just stops consuming them.

* fix(cockpit): test-isolation clear() + honest isPending + harness TODO

- TabActivityRegistry gains a clear() escape hatch next to size(); the
  routes/tabs test now calls it in afterEach so a stale record from
  one test cannot surface in another that re-attaches a session.
- useCockpitData isPending now OR-combines tabs/approvals/handoffs so
  any future caller wiring a spinner sees the actual loading state.
- tabsToAgentRows annotates the hardcoded harness with a TODO pointing
  at the PR-3 profile join so the simplification stays visible.
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Too many files changed for review. (271 files found, 100 file limit)

Conflicts and resolutions:
- apps/server/package.json: take main's newer @ai-sdk/* pins, keep
  our @browseros/agent-mcp-interface workspace dep.
- apps/eval/src/agents/single-agent.ts: take main's onStepFinish-only
  shape. Main migrated away from experimental_onToolCallStart /
  experimental_onToolCallFinish (removed in ai-sdk v6.0.208); the
  per-tool-call logic now lives inside onStepFinish's toolCalls loop
  on this branch too.
- apps/eval/src/agents/orchestrated/backends/tool-loop/tool-loop-executor-backend.ts:
  same migration; drop the obsolete header comment about the
  experimental_on* workaround.
- bun.lock: take main, re-run bun install to reconcile against the
  merged package.json.
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

✅ Tests passed — 1393/1397

Suite Passed Failed Skipped
agent 271/271 0 0
build 26/26 0 0
eval 91/91 0 0
server-agent 301/301 0 0
server-api 138/138 0 0
server-browser 10/10 0 0
server-integration 10/10 0 0
server-lib 251/252 0 1
server-root 47/50 0 3
server-tools 248/248 0 0

View workflow run

…-zod workspace

CI typecheck on the merge commit failed in the eval package: the
onStepFinish destructure inherited from main resolves to implicit-any
under this branch's workspace where the cockpit pins zod v4 and the
server pins zod v3, so the ai-sdk generate() option type widens. Re-
introduce the explicit `any` annotations + biome-ignore comments that
the branch carried before the merge took main's cleaner shape; the
runtime contract is unchanged.

Single-agent.ts: `onStepFinish: async (step: any)` with biome-ignore.
Tool-loop-executor-backend.ts: typed destructure literal with two
biome-ignores on the toolCalls and toolResults any fields.
@DaniAkash Dani Akash (DaniAkash) merged commit 4a04f84 into main Jun 23, 2026
15 of 16 checks passed
@DaniAkash Dani Akash (DaniAkash) deleted the feat/agent-mcp-interface-bootstrap branch June 23, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant