feat(cockpit): tab activity registry + homepage feedback loop (PR 1/3)#1331
Conversation
… route Wraps the existing executeTool dispatch in mcp/register.ts so every successful browser-tool call is recorded against the calling agent and the targeted CDP target id. Failed dispatches and tools without a page arg (tab_groups, windows, run) are skipped. Records live in an in-memory map keyed by targetId; status is derived at read time (active for 5s after the last tool, idle afterwards). Closed tabs are evicted lazily when the next snapshot read finds the pageId no longer maps to the original targetId (pageIds are reused after close). The GET /tabs/activity route surfaces the current snapshot and is mounted into the AppType chain so the UI hono-rpc client picks it up automatically. No server-package edits anywhere; the cockpit reads PageManager via the shared BrowserSession singleton it already owns.
…ling useTabsActivity polls GET /cockpit/tabs/activity every 1500ms via the existing hono-rpc client; cockpit.data composes that with the existing mocked approvals/handoffs so the screen calls one hook only. Active records become RunningGrid cards (status=running); idle records become RecentActivity rows (status=done) with a relative-time string. Helper file derives a stable per-slug color, parses the site from the URL, and formats the relative timestamp. Mocked useAgents / useRecentActivity stay in place for any other surface that imports them; the homepage just stops consuming them.
✅ Tests passed — 1393/1397
|
Greptile SummaryThis PR introduces a tab-activity registry for the BrowserOS cockpit: an in-memory map keyed by stable CDP
Confidence Score: 4/5Safe to merge — zero changes to apps/server, no schema/migration impact, and all new behaviour is additive behind a polling endpoint the UI was already scaffolded for. The registry logic, eviction strategy, and route wiring are all correct and well-tested. The main concerns are non-blocking: the route-test cleanup leaves the internal Map unpurged (the comment says "evict by detaching the session" but no eviction actually occurs), the CockpitData isPending flag only tracks the tabs query so it could mislead a future loading-gate, and the hardcoded harness value is a known PR-1 simplification that risks being forgotten across the three-PR arc. routes.test.ts deserves a second look on the cleanup strategy; cockpit.data.ts and cockpit.helpers.ts have the minor gaps noted above. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Agent as Agent (MCP Client)
participant Register as mcp/register.ts
participant Registry as TabActivityRegistry
participant PageMgr as PageManager
participant Route as GET /tabs/activity
participant UI as Cockpit UI (1500ms poll)
Agent->>Register: "tool call (e.g. navigate {page:1, url:...})"
Register->>Register: executeTool(...)
Register->>Register: extractPageId(toolName, rawArgs)
alt "pageId is valid & result.isError === false"
Register->>PageMgr: "getInfo(pageId) → live {targetId, url, title}"
Register->>Registry: "recordTool({agentId, slug, pageId, targetId, toolName})"
Registry->>Registry: records.set(targetId, RawRecord)
end
UI->>Route: GET /cockpit/tabs/activity
Route->>Registry: snapshot()
Registry->>PageMgr: getInfo(pageId) per record
alt pageId still maps to same targetId
Registry-->>Route: "TabActivityRecord (status = active|idle)"
else tab closed / pageId reused
Registry->>Registry: records.delete(targetId)
end
Route-->>UI: "{ tabs: TabActivityRecord[] }"
UI->>UI: tabsToAgentRows (active → RunningGrid)
UI->>UI: tabsToActivityRows (idle → RecentActivity)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Agent as Agent (MCP Client)
participant Register as mcp/register.ts
participant Registry as TabActivityRegistry
participant PageMgr as PageManager
participant Route as GET /tabs/activity
participant UI as Cockpit UI (1500ms poll)
Agent->>Register: "tool call (e.g. navigate {page:1, url:...})"
Register->>Register: executeTool(...)
Register->>Register: extractPageId(toolName, rawArgs)
alt "pageId is valid & result.isError === false"
Register->>PageMgr: "getInfo(pageId) → live {targetId, url, title}"
Register->>Registry: "recordTool({agentId, slug, pageId, targetId, toolName})"
Registry->>Registry: records.set(targetId, RawRecord)
end
UI->>Route: GET /cockpit/tabs/activity
Route->>Registry: snapshot()
Registry->>PageMgr: getInfo(pageId) per record
alt pageId still maps to same targetId
Registry-->>Route: "TabActivityRecord (status = active|idle)"
else tab closed / pageId reused
Registry->>Registry: records.delete(targetId)
end
Route-->>UI: "{ tabs: TabActivityRecord[] }"
UI->>UI: tabsToAgentRows (active → RunningGrid)
UI->>UI: tabsToActivityRows (idle → RecentActivity)
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
packages/browseros-agent/apps/agent-mcp-interface/tests/routes/tabs/routes.test.ts:24-29
**Misleading cleanup comment + fragile registry isolation**
`setBrowserSession(null)` causes `snapshot()` to short-circuit with `return []`, but it does **not** clear the internal `records` Map. The comment "we evict by detaching the session" is inaccurate — no eviction happens; the entries simply stay in the Map unread. If a future test records a `targetId` that doesn't collide with an existing key AND that same `targetId`/`pageId` pair happens to be resolvable in a subsequent test's session stub, the stale record would surface in `snapshot()`. Adding a `clear(): void` method to `TabActivityRegistry` (next to the existing test-only `size()`) and calling it here would make the isolation explicit and safe.
### Issue 2 of 3
packages/browseros-agent/apps/agent-mcp-ui/screens/cockpit/cockpit.data.ts:42
**`isPending` only reflects the tabs query**
`isPending: tabs.isPending` ignores `approvals.isPending` and `handoffs.isPending`. Any caller that gates a loading spinner on `CockpitData.isPending` would treat the page as fully loaded while approvals/handoffs are still resolving, which could cause a brief flash of empty sections. Consider `isPending: tabs.isPending || approvals.isPending || handoffs.isPending` (or, since approvals/handoffs are currently mocked and resolve synchronously, at least document why the narrower scope is intentional).
### Issue 3 of 3
packages/browseros-agent/apps/agent-mcp-ui/screens/cockpit/cockpit.helpers.ts:58
**`harness` hardcoded to `'Claude Code'`**
Every `AgentRow` gets `harness: 'Claude Code'` regardless of the agent's actual harness, since `TabActivityRecord` doesn't carry that field. If the cockpit ever surfaces agents running under a different harness, the displayed value will silently be wrong. The PR description acknowledges this is a PR-1 simplification; it would be worth a `// TODO(PR-3): derive from agent profile` comment here so the debt stays visible and isn't forgotten across the three-PR arc.
Reviews (1): Last reviewed commit: "feat(agent-mcp-ui): drive cockpit homepa..." | Re-trigger Greptile |
- TabActivityRegistry gains a clear() escape hatch next to size(); the routes/tabs test now calls it in afterEach so a stale record from one test cannot surface in another that re-attaches a session. - useCockpitData isPending now OR-combines tabs/approvals/handoffs so any future caller wiring a spinner sees the actual loading state. - tabsToAgentRows annotates the hardcoded harness with a TODO pointing at the PR-3 profile join so the simplification stays visible.
a928660
into
feat/agent-mcp-interface-bootstrap
…P attach, tab attribution (#1333) * feat(agent-mcp-interface): bootstrap package skeleton First slice of the new BrowserOS v2 backend, per the architecture plan. packages/browseros-agent/apps/agent-mcp-interface/ package.json hono dep only; @browseros/agent-mcp-interface tsconfig.json extends monorepo root, composite + emit decls biome.json extends "//", noConsole/noProcessEnv on by default src/ shared/port.ts PROD_API_PORT = 9200 (distinct from 9000/9100/9300) lib/logger.ts Structured JSON to stderr, pino-shaped fields lib/errors.ts HttpError; { error: string } JSON shape env.ts Single chokepoint for process.env reads local-server-url.ts Write-once-after-bind module singleton routes/system.ts /system/health, /system/version, /system/url server.ts Chained .route('/') composition; exports type AppType = typeof routes for the future agent-mcp-ui hono-rpc client main.ts Bun.serve on 127.0.0.1:PROD_API_PORT; sets localServerUrl + logs the bound URL Wiring: packages/browseros-agent/tsconfig.json references += this package packages/browseros-agent/.fallowrc.json entry += src/main.ts Verified: bun run --filter @browseros/agent-mcp-interface typecheck clean bunx biome check apps/agent-mcp-interface/ clean bunx fallow check no new findings bun src/main.ts + curl /system/{health,version,url} ok byok ai-sdk path and the existing apps/server / apps/agent packages are not touched. * chore: minor edits * chore(agent-mcp-interface): scope fallow config to the package Pulls the agent-mcp-interface entry back out of the monorepo-wide packages/browseros-agent/.fallowrc.json and gives the package its own .fallowrc.json. Running `bun run fallow` from inside the package now analyses just this package against its single entry (src/main.ts); running fallow from the agent root no longer drags this package in. Also adds a `fallow` script to the package.json so the command is reachable via `bun run --filter @browseros/agent-mcp-interface fallow`. * Revert "chore(agent-mcp-interface): scope fallow config to the package" This reverts commit 02390dca173fbb9253534b1868fa64f6540a6286. * feat(agent-mcp-ui): bootstrap WXT extension surface Second package of the BrowserOS v2 split. WXT extension that targets the new-tab page and (eventually) the side panel + content overlays. First slice proves the full pipe: React + shadcn (base-vega) renders, TanStack Router resolves routes, TanStack Query + react-query-kit hooks fetch through a typed hono-rpc client, and the runtime closes the loop against the agent-mcp-interface server bound on the same machine. packages/browseros-agent/apps/agent-mcp-ui/ package.json WXT, React 19, Tailwind 4, TanStack Router 1.x, TanStack Query 5, react-query-kit, @base-ui/react, @tabler/icons-react, @browseros/agent-mcp-interface workspace dep (type-only AppType + the PROD_API_PORT constant) tsconfig.json extends .wxt/tsconfig.json, jsx react-jsx, types chrome + bun, paths @/* -> ./* biome.json extends "//", enables tailwindDirectives, ignores routeTree.gen.ts / .wxt / .output / dist, relaxes lint inside components/ui + ai-elements wxt.config.ts @wxt-dev/module-react, manifest with chrome_url_overrides.newtab + 5 permissions, Vite plugins: tanstackRouter (enforce: 'pre', autoCodeSplitting off for now) + tailwindcss components.json shadcn base-vega style, neutral, tabler icons entrypoints/app/ index.html + main.tsx (QueryClientProvider + RouterProvider) + styles.css (Tailwind 4 @theme inline tokens) lib/utils.ts cn() helper (clsx + tailwind-merge) components/ui/ button.tsx, card.tsx, badge.tsx via shadcn CLI modules/api/ client.ts hc<AppType>(baseUrl) + lazy Proxy that re- resolves the URL on each property access queryClient.ts retry: 1, staleTime: 30_000 parseResponse.ts ApiError thrower with .status + .body system.hooks.ts First react-query-kit hooks against the /system/{health,version,url} endpoints routes/ __root.tsx + index.tsx (file-based routing) routeTree.gen.ts Generated by tanstackRouter plugin screens/cockpit/ Minimal Cockpit.tsx that calls useSystemHealth + useSystemVersion, renders shadcn Card + Badge, shows the interface server's name + version on success and a copy-pasteable hint on failure Also exposes shared/port from @browseros/agent-mcp-interface as a real runtime export so the UI can dial in to PROD_API_PORT. The server type export stays type-only ("default": null). Verified: bun run --filter @browseros/agent-mcp-interface typecheck clean bun run --filter @browseros/agent-mcp-ui typecheck clean bunx biome check (both packages) clean bunx wxt build (chrome-mv3 production) clean bun src/main.ts + curl /system/{health,version} ok * refactor(agent-mcp-ui): swap TanStack Router for react-router v7 The existing apps/agent extension routes with react-router v7 + HashRouter. agent-mcp-ui was deviating from that for no real reason — TanStack Router's typed loaders aren't exercised at the bootstrap stage, and the plugin-order collision with WXT's module-react had already forced autoCodeSplitting off. Matching the in-repo precedent removes the codegen step, drops two deps, simplifies wxt.config.ts, and shaves ~48 kB off the production bundle (384 → 336 kB). Changes: package.json -@tanstack/react-router, -@tanstack/router-plugin +react-router ^7.12.0 wxt.config.ts Drop prependRouterPlugin helper + the enforce:'pre' workaround. Plugins now: [tailwindcss()]. Module-react injects @vitejs/plugin-react as before. routes/ Deleted. __root.tsx, index.tsx, routeTree.gen.ts gone. entrypoints/app/App.tsx New. Code-based HashRouter + Routes + Route, matching apps/agent's pattern. entrypoints/app/main.tsx Drops createRouter + RouterProvider + module augmentation; renders <App /> inside QueryClientProvider + StrictMode. biome.json Drops the !routeTree.gen.ts ignore. Verified: bun install clean bun run --filter @browseros/agent-mcp-ui typecheck clean bunx biome check clean bunx wxt build clean, 336 kB bun src/main.ts + curl /system/{health,version} ok * feat(agent-mcp-ui): wire wxt dev launch into BrowserOS Chromium Mirrors the existing apps/agent launch shape so `bun run dev` boots BrowserOS with this extension installed instead of stock Chromium. web-ext.config.ts defineWebExtConfig with BrowserOS binary path, dev-sane chromiumArgs (--use-mock-keychain, --disable-browseros-server, --disable-browseros-extensions, --browseros-dock-icon=dev), and a worktree+package-scoped Chromium profile under /tmp/browseros-dev-<worktree>-<packageHash>. Profile dir distinct from apps/agent's so the two dev runs never share state. BROWSEROS_CDP_PORT / BROWSEROS_SERVER_PORT / BROWSEROS_EXTENSION_PORT / BROWSEROS_USER_DATA_DIR / BROWSEROS_BINARY env overrides supported with the same names the agent extension already uses. .env.example Documents the env vars; copy to .env.development to enable them. Every entry is optional. package.json dev → bun --env-file=.env.development wxt build:dev → same, for the development-mode build Verified: bun --print 'await import("./web-ext.config.ts").then(m => m.default)' resolves the BrowserOS binary path + the per-worktree+package profile dir bunx wxt prepare clean bun run typecheck clean bunx biome check clean bunx wxt build clean, 336 kB * feat(agent-mcp-ui): cockpit + new-agent + governance + live-run (#1193) * feat(agent): Remote Hermes provider — Cloudflare-managed Fly VM runtime (#1174) * chore: bump .internal-docs submodule * feat(agent): add Remote Hermes provider backed by Cloudflare control plane Adds a new 'remote-hermes' provider that runs the Hermes agent in a managed Fly VM provisioned via the Cloudflare agent-control-worker. Per-install VM identified by browserosId; chat turns proxy through the worker (HTTP + SSE), and the VM dispatches tool calls back to the laptop's local BrowserOS MCP over a single WebSocket held open by apps/server. No API key, base URL or model required at provider-add. Agent UI - New provider type 'remote-hermes' with Sparkles icon, surfaced first in the Settings template grid with the orange "Recommended" treatment. - Add-provider flow needs only a name; backend handles credentials. - Inline boot pill (RemoteHermesBootPill) shows live progress through the cold-start stages (pulling_image, booting, healthchecking). - Delete provider triggers /remote-hermes/destroy for the last entry. apps/server - lib/remote-hermes/: env, HS256 JWT minting (jose), frame parser, partysocket-backed WS bridge with mutex/refcount/idle-close, RPC router dispatching browseros tool calls to 127.0.0.1:<port>/mcp, ProtocolEvent -> AI SDK UIMessageStream translator, and the turn streamer with cold-start polling (180s budget against /vm/status). - /chat forks on provider='remote-hermes' and pipes the worker's SSE through createUIMessageStreamResponse - side panel sees the same AI SDK stream format as any other provider. - New /remote-hermes route: POST /start, POST /destroy, GET /status. Lifecycle endpoints fire-and-forget; status proxies the worker. Plus collateral lint:fix touch-ups in eval/probeAgent/managedBlock. * fix(agent): address remote-hermes review feedback - bridge: close+null any existing socket at the top of doOpen() so a ReconnectingWebSocket that opens AFTER our 5s OPEN_DEADLINE_MS rejected cannot fire 'open' on the bridge's behalf and start a parallel ping/idle-sweep loop on the wrong socket. Every event handler now checks `this.socket === sock` and returns early when the dispatched socket isn't the live one. - turn: derive the cold-start error message and the boot-poll comment from COLD_START_BUDGET_MS instead of the stale literal "90 seconds" left over from the original 90s budget. - frames: drop unused PONG_FRAME export. pong frames are only synthesized by Cloudflare's setWebSocketAutoResponse worker-side; the laptop never emits one. - remote-hermes route: drop the dead `method: 'POST'` parameter from fireVmLifecycle; both call sites passed the literal and the function always POSTs. * refactor(server): rework Remote Hermes layering to match KlavisClient pattern Addresses code-quality feedback on the original PR. Three problems with the previous shape: 1. Per-request env reads. loadRemoteHermesEnv() ran on every /chat fork and every /remote-hermes/* handler. Now AGENT_RUNNER_JWT_SECRET lives in INLINED_ENV (build-time inlined) and the worker URL is EXTERNAL_URLS.AGENT_CONTROL_WORKER. Both read once at module load, never re-read. 2. Mixed responsibilities in lib/remote-hermes/. The flat dump conflated the HTTP wire client, the WS bridge, the SSE pump, JWT minting, env parsing, and the route handler logic. Split following the existing Klavis precedent: lib/clients/remote-hermes/ remote-hermes-client.ts raw HTTP wrapper (mintJwt + fetch) ws-bridge.ts persistent WS, refcount/idle/race-safe auth.ts, frames.ts, rpc-router.ts, event-translator.ts constants.ts module-internal tunables only api/services/remote-hermes/ remote-hermes-service.ts high-level facade. owns bridge lifecycle, exposes warm/teardown/status/streamTurn, no env or fetch lives here 3. Hidden singleton + inline lifecycle logic in handlers. getBridge() was a module-level singleton constructed inside the /chat handler. Now the service is constructed once in createHttpServer() when INLINED_ENV.AGENT_RUNNER_JWT_SECRET is present (warn-only when absent, matches Klavis behaviour), threaded into ChatRouteDeps + the new RemoteHermesRouteDeps, and closed on Application.shutdown. Net effect: /chat fork: 50 lines -> 10 lines /remote-hermes routes: 119 lines -> 47 lines lib/clients/remote-hermes: cleaner per-file responsibility All [remote-hermes] template logs replaced with structured fields `module: 'remote-hermes'`, matching the rest of the codebase. Shared constants moved to packages/shared/src/constants/hermes.ts: REMOTE_HERMES_PROVIDER_TYPE, REMOTE_HERMES_AGENT_KIND, REMOTE_HERMES_DEFAULT_AGENT_ID. EXTERNAL_URLS.AGENT_CONTROL_WORKER added — matches KLAVIS_PROXY shape. No behaviour change: chat turns, boot pill, cold-start poll, WS bridge race fixes from the prior commit, /vm/start /vm/destroy /vm/status all preserved end-to-end. * fix(server): strip duplicated browseros_browseros_ tool name prefix Tool cards in the side panel rendered as "Mcp browseros browseros suggest app connection" instead of "suggest_app_connection". Cause: acpx normalizes the VM catalog's "<server>.<tool>" dot into an underscore when emitting tool names. Combined with the MCP server name we configure ("browseros") and the catalog server name (also "browseros" since the worker fix in this branch), the wire name becomes "browseros_browseros_suggest_app_connection". Our existing strip only handled the double-underscore acpx prefix and the dot-separated catalog prefix. Add a third pattern that matches "<word>_<word>_" only when the two word groups are identical, so we never chew the head off an unrelated tool that happens to start "browseros_". * refactor(remote-hermes): post-review code-quality sweep apps/server: - Drop the now-unused Remote Hermes event translator — the runtime service emits AI SDK UI Message Stream parts directly, so the laptop just forwards them. - warm()/teardown() now throw on non-2xx from the worker so the route's .catch logs a real error instead of swallowing the failure. - pumpEvents() dismisses the boot pill in finally — handles the edge where the stream ends before any non-`start` part arrives. - /chat fork logs a real reason when remote-hermes hits a server with the service unconfigured. apps/agent: - Add REMOTE_HERMES_PROVIDER_TYPE in lib/llm-providers/types.ts (local mirror of @browseros/shared since the WXT bundle doesn't depend on the shared package) and use it in isRemoteHermesType + ProviderTemplatesSection. Replace the double-filter pin pattern with a sort + Fragment-after-Hermes layout so the order is expressed declaratively. * chore(agent): trim mcp-manager dead exports and unused devDeps (#1176) * chore(agent): trim mcp-manager dead exports and unused devDeps Drives the browseros-agent fallow report from 17 issues down to 4 (the residual 4 are all in the remote-hermes surface added by #1174 and out of scope here). mcp-manager: - Slim the public barrel to the 7 symbols real consumers import (routes, main, the two reconcile/service test files). Drop the re-exports of BROWSEROS_MCP_SERVER_NAME, BROWSEROS_MCP_STDIO_SERVER_NAME, getMcpManager, ReconcileUrlInput, InstallAgentResult, McpAgentId, McpAgentRow, ReconcileResult, UninstallAgentResult — all of which are only used by sibling files inside mcp-manager/ itself. - Narrow BROWSEROS_SERVER_NAMES and planFor in service.ts to module-private (used only inside service.ts). This also resolves the AgentServerPlan private-type-leak since planFor becomes private. - Drop the dangling McpAgentIdentifier type alias (zero consumers). UI: - Export IntegrationsSectionProps so the exported IntegrationsSection no longer references a private type. - Narrow AGENT_PRESENTATION in integrations-section.helpers.ts to module-private (used only by presentationFor() in the same file). Deps: - Remove dotenv and picocolors from devDependencies — neither is imported anywhere in the package (dotenv is even explicitly documented as not needed in the agent README). * chore(agent): restore dotenv/picocolors, fix remaining fallow findings, gate CI Restoring the two devDeps that were wrongly dropped (build-script tests on CI failed: `Cannot find package 'picocolors' / 'dotenv'`). Both are actually consumed by `scripts/build/{server,cli}.ts` and `scripts/build/log.ts`, but those entry points are outside fallow's discovery (the package.json script paths use parent-directory traversal that fallow skips). Listing both in `.fallowrc.json` `ignoreDependencies` matches the existing convention used for `pino-pretty`. Fixes the 4 remaining fallow findings introduced by #1174 so the new CI gate can be green from day one: - Export `RemoteHermesBootPillProps` (purely additive — resolves the private-type-leak on `RemoteHermesBootPill`, and incidentally surfaces the embedded `RemoteHermesVmStatus` as part of its public shape). - Export `SocketState` in `ws-bridge.ts` (purely additive — resolves the private-type-leak on the diagnostic-exposed `snapshot()` method). - Annotate the orphan `PostTurnResult` with `// fallow-ignore-next-line unused-type` so it stays available for follow-up wiring without blocking CI. Adds a `runner / Fallow` job to `.github/workflows/code-quality.yml` parallel to Biome and Typecheck. Same shape: checkout → setup-bun → `bun ci` → `bun fallow`. PRs that touch `packages/browseros-agent/**` now gate on the dead-code report. * feat(agent-mcp-ui): install AI Elements via shadcn registry 48 components from elements.ai-sdk.dev pulled in via: bunx shadcn@latest add @ai-elements/<each component> Catalog covered: Chatbot attachments, chain-of-thought, checkpoint, confirmation, context, conversation, inline-citation, message, model-selector, plan, prompt-input, queue, reasoning, shimmer, sources, suggestion, task, tool Code agent, artifact, code-block, commit, environment-variables, file-tree, jsx-preview, package-info, sandbox, schema-display, snippet, stack-trace, terminal, test-results, web-preview Voice audio-player, mic-selector, persona, speech-input, transcription, voice-selector Workflow canvas, connection, controls, edge, node, panel, toolbar Utilities image, open-in-chat Files land in components/ai-elements/ and consume shadcn primitives from components/ui/, which the same install grew from 3 to 25 to cover all the dependencies (accordion, command, dialog, dropdown, hover-card, popover, scroll-area, select, tabs, tooltip, etc.). Peer deps brought in (devDependencies left untouched; runtime only): ai, streamdown, @streamdown/{cjk,code,math,mermaid}, shiki, @xyflow/react, motion, @rive-app/react-webgl2, media-chrome, cmdk, embla-carousel-react, lucide-react, nanoid, tokenlens, use-stick-to-bottom, react-jsx-parser, ansi-to-react, @radix-ui/react-use-controllable-state. @base-ui/react bumped to ^1.5.0 because the components target a newer API surface than the previous ^1.0.0-beta.6 exposed. Six ai-elements files (attachments, context, inline-citation, plan, prompt-input, voice-selector) ship with `closeDelay`/`openDelay`/ event-handler shapes that don't typecheck against @base-ui/react 1.5.0. Bundled JS runs fine (unknown props get spread onto DOM and silently ignored, event handler arities are forgiving at runtime), only tsc strictness flags them. Marked with `// @ts-nocheck` at the top of each, with a comment explaining the posture. biome.json now disables both formatter and linter for components/ui/** and components/ai-elements/** (third-party drop-ins; both shadcn-installed quote style and the AI Elements internal patterns differ from the repo style). organizeImports assist action also off on the same paths. Verified: bun run typecheck clean bunx biome check clean (89 files) bunx wxt build clean, chrome-mv3 397 kB * feat(agent-mcp-ui): cockpit shell + sidebar + 4 routes Foundation pass on top of the WXT bootstrap. Sidebar with hover-expand behaviour matching apps/agent's idiom, four routed surfaces (cockpit, agents, governance, mcp), plus a stub /agents/new for the future wizard. The cockpit page is the full dashboard design from the prototype, hooked up to mock react-query-kit hooks shaped so the eventual swap to real agent-mcp-interface routes is a fetcher body change. Layout entrypoints/app/App.tsx HashRouter with single layout route (CockpitShell) wrapping 5 children entrypoints/app/main.tsx TooltipProvider added, @fontsource imports for Schibsted Grotesk + Newsreader italic + JetBrains Mono components/layout/CockpitShell.tsx fixed sidebar (w-14 collapsed, w-64 expanded, 150ms collapse delay) + main outlet, pl-14 offset components/layout/PlaceholderScreen shared "coming soon" composite Sidebar components/sidebar/AppSidebar.tsx branding + navigation, no user footer until we have a setting to surface components/sidebar/SidebarBranding.tsx orange B mark + wordmark components/sidebar/SidebarNavigation.tsx 4 lucide-iconed NavLinks with base-ui Tooltip (render prop, not asChild) on collapsed Cockpit surfaces components/cockpit/CockpitHero.tsx hero with serif italic accent components/cockpit/WaitingStrip.tsx container for approvals + handoffs components/cockpit/ApprovalBanner.tsx 3-button approval card components/cockpit/HandoffRow.tsx amber "take over" row components/cockpit/RunningGrid.tsx auto-fill grid + live count chip + AddAgentTile at the end components/cockpit/RunningCard.tsx mini-screencast + label + status + task + watch/stop components/cockpit/AddAgentTile.tsx dashed-border "+ New profile" tile linking to /agents/new components/cockpit/RecentActivity.tsx list container with flagged- count chip components/cockpit/ActivityRow.tsx per-row status icon + agent dot + jump-to action components/cockpit/StatusBadge.tsx token-driven status pill components/cockpit/MiniScreencast.tsx placeholder card-top tile Placeholder screens screens/cockpit/Cockpit.tsx rewrite: composes the surfaces above against mock hooks screens/agents/Agents.tsx placeholder screens/governance/Governance.tsx placeholder screens/mcp/Mcp.tsx placeholder screens/new-agent/NewAgent.tsx placeholder Data modules/api/agents.hooks.ts useAgents (mock; 3 running rows) modules/api/waiting.hooks.ts useApprovals + useHandoffs (1 + 1) modules/api/activity.hooks.ts useRecentActivity (4 rows: blocked, needs-human, allowed, done) lib/status.ts RunStatus union + STATUS_META map + isActiveStatus / isEndedStatus helpers; single source of truth so colors stay consistent across the cockpit, audit, and activity log Design tokens entrypoints/app/styles.css full BrowserOS warm-cream palette wired via @theme inline (shadcn primitives re-pointed at BrowserOS surfaces; bespoke ink scale, status palette, accent ink, shadows, pulse-dot / fade keyframes). Body carries the design's layered radial gradient. Selection uses accent tint instead of chrome blue. Deps @fontsource-variable/schibsted-grotesk @fontsource-variable/jetbrains-mono @fontsource/newsreader (400-italic + 500-italic only) Verified: bun run --filter @browseros/agent-mcp-ui typecheck clean bunx biome check clean (113 files) bunx wxt build clean (880 kB: 317 kB JS, 80 kB CSS, ~480 kB fonts) * fix(agent-mcp-ui): declare the browserOS permission Without `browserOS` in the manifest's permissions array, BrowserOS Chromium's new-tab override gate refuses the extension's claim and the cockpit never replaces chrome://newtab. The apps/agent extension sits on the same permission for the same reason. Adds two near-neighbours that the cockpit will reach for soon: webNavigation for routing the future live-run jump targets (the rest were already declared) * fix(agent-mcp-ui): switch to WXT's conventional newtab entrypoint Renames entrypoints/app/ to entrypoints/newtab/ so WXT auto-wires manifest.chrome_url_overrides.newtab against the generated newtab.html. Drops the hand-rolled chrome_url_overrides block from wxt.config.ts since WXT now manages it. The output file is newtab.html instead of app.html; nothing internal references the page by filename so no other code change is needed. Build output verified: manifest carries { chrome_url_overrides: { newtab: 'newtab.html' } } automatically. Reference: https://wxt.dev/guide/essentials/entrypoints.html#newtab * chore(agent-mcp-ui): wire react-doctor into lint Lint now runs biome and react-doctor concurrently as a single pass via concurrently --group, so findings from both tools land in one terminal output and the combined exit code is non-zero if either fails. react-doctor is invoked through bunx (no devDep yet) so the bun release-age gate doesn't block its newest versions. Config in doctor.config.json mirrors biome's vendored-path ignores (components/ui, components/ai-elements, build outputs). * chore(agent-mcp-ui): add .gitignore and untrack .wxt artifacts Mirrors apps/agent's .gitignore. .wxt/ is regenerated on every wxt dev/build, so the seven previously-tracked files in it churned the diff for no reason. * chore: added verbose flag * chore(agent-mcp-ui): split lint and react-doctor into separate scripts Running them together wasn't worth the extra plumbing. lint stays biome-only; react-doctor moves to a dedicated lint:doctor script with --verbose. Drops concurrently from devDeps. * feat(agent-mcp-ui): new-agent wizard at /agents/new Replaces the placeholder with a 4-section wizard (harness, logins, tool approvals, ACL rules) plus a sticky preview rail showing the MCP URL and an Add-to-harness CTA. Form wires through react-hook-form with a zod schema and shadcn's Form primitive; the submit fires useCreateAgent (react-query-kit mutation, mocked latency) and flips the rail into an added state with a Done button that returns to /agents. Adds shadcn form/label/radio-group/toggle/toggle-group primitives. form.tsx is hand-written for the base-vega Label since the registry copy depends on @radix-ui/react-label which the project doesn't ship. Approvals row uses ToggleGroup with single-select; ACL rows split into a toggle button + sibling trash button to clear the nested-interactive lint warning. doctor.config.json sets deadCode:false because react-doctor's unused-file rule cannot trace WXT's entry resolution and was flagging every shipped component. * feat(agent-mcp-ui): governance hub + audit tab Replaces the placeholder at /governance with a 4-tab hub (Audit, Permissions, Site Rules, Grants) backed by nested routes. The shell renders a sticky header with a pulse-dot live-run counter and a shadcn Tabs nav whose triggers navigate per-tab URLs; the matched sub-route renders inside the Outlet. Audit tab ships: filter chips (All/Running/Blocked/Completed) via ToggleGroup, run-count line, and a list of AuditRow cards (status icon + agent/harness + status pill + meta) that link out to /governance/audit/:runId/replay when clicked. Runs come from a new useRuns mock that mirrors the eventual hono-rpc shape. Permissions, Site Rules, and Grants render a small ComingSoonTab stub so the tab nav feels complete and the URL space is reserved. * fix(agent-mcp-ui): tab + chip active states Base-ui Toggle emits aria-pressed, not data-state=on / data-pressed=true, so the governance filter chips and the new-agent approval toggles weren't actually flipping color when selected. Swapping the selectors to aria-pressed: ties the visual state to the primitive's real attribute. Active governance tab now shows an accent-orange underline. The base shadcn tabs.tsx applies after:bg-foreground and after:bottom-[-5px] in its own className, and Tailwind utility cascade ordering meant my override classes were being beaten. Adding ! to after:bg-accent and after:bottom-[-1px] wins the cascade and lands the underline right on the TabsList's bottom border. * fix(agent-mcp-ui): tabs primitive matches base-ui's data-orientation Base-ui's Tabs root emits data-orientation=horizontal but the shadcn base-vega tabs.tsx was selecting on data-horizontal (no attribute value), which never matched. With the gate failing, the active-tab underline's after:inset-x-0 and after:h-0.5 were dropped, leaving the pseudo-element at 0x0 and invisible. Side-by-side with the prototype made the gap obvious: same accent color and bottom offset, but no underline rendered. Swapping the four group-data-horizontal/group-data-vertical selectors to group-data-[orientation=horizontal] / group-data-[orientation=vertical] lets them match what base-ui actually emits; the active trigger now paints the 2px accent underline at the TabsList border, matching the prototype. * feat(agent-mcp-ui): live-run view with approval and handoff overlays New full-bleed route /run/:runId sits outside the CockpitShell wrapper and gives every agent run its own watch view: a stubbed browser viewport on the left with fake chrome, a centred site host placeholder, the persistent agent-driving badge, and a working pill spelling out the live action; a docked activity panel on the right with the action log, a pinned approval card when the run needs an OK, a pinned handoff notice when it needs the user, an inline block notice when Site Rules killed an action, plus elapsed/tokens/steps stats and pause/stop controls. Approval card honours the v1 UX spec's three-button shape (Allow once, Always allow on domain, Block) and the scope sentence that pins the permission to the current domain. Handoff banner is a full overlay over the viewport with the amber top strip, dimmed page, and an I'm-not-a-robot challenge stub standing in for the real site's CAPTCHA/2FA; the matching in-panel HandoffNotice means the user can resume from either surface. Local state dismisses approvals, handoff, and block notices since the backend isn't wired yet. Cockpit RunningCard 'Watch' button now navigates to /run/<agentId>; run fixtures key by agent id so the cockpit-to-live flow lands on real data for the three running agents. Mock useRun mirrors the eventual hono-rpc /runs/:id + SSE shape. * ci: run code-quality on PRs targeting feat/agent-mcp-interface-bootstrap Stacked PRs on the agent-mcp-interface bootstrap branch were skipping biome / typecheck / fallow because the workflow's pull_request filter only matched main and dev. Adding the explicit branch lets the quality gate fire on every stacked PR before it lands on the parent. --------- Co-authored-by: shivammittal274 <56757235+shivammittal274@users.noreply.github.com> * feat(agent-mcp-ui): replay + agents directory + mcp registry (#1221) * feat(agent-mcp-ui): replay view at /governance/audit/:runId/replay Full-bleed replay player that sits outside CockpitShell so the recorded run gets the whole viewport. The top bar shows the task title, agent and harness, status pill, and a stat strip (duration, tokens, steps, approvals). The body splits into a reconstructed browser viewport on the left (fake chrome + site host placeholder + caption pill that tracks the playhead), a transport with play/pause/restart + native range-input scrubber (overlaid with accent track, kind-coloured bookmark dots for approval/block/done frames, and an accent thumb) + 1x/2x/4x speed toggle, and a right rail Action Timeline whose rows highlight the current frame, dim future frames, and click-to-seek. Playback wallclock lives in usePlayback hook (the project's one allowed useEffect case: starting and cancelling setInterval tied to play state). Scrubber is a real <input type="range"> styled transparently over the visual track so we get native click-to-position, keyboard arrows, Home/End, and screen-reader semantics for free; bookmark buttons sit at z-10 above the track and below the input, so direct clicks still seek to their frame. Mock useReplay keyed by run id mirrors the eventual /runs/:id/replay shape. * feat(agent-mcp-ui): agents directory at /agents with revoke flow Replaces the /agents placeholder with a real directory of configured agent profiles. Header shows the configured-count pill and a primary Add agent CTA that lands the user on the existing /agents/new wizard; the body renders one row per profile with the harness icon chip, name + harness, scope summary (logins, ACL rules, blocked actions, always-allow grants), last-run timestamp, status badge (Configured / Paused / Disabled), and Edit + Revoke buttons. Empty state renders a dashed coming-soon-style card with its own Add CTA. Revoke runs through shadcn AlertDialog (not window.confirm) so focus trapping and ARIA semantics ship for free. useDeleteAgent's onSuccess writes back to the agent-profiles cache via setQueryData so the row vanishes immediately without a refetch, per the project's no-parallel-state-over-cache rule. Edit currently navigates to /agents/:id/edit which is a placeholder slot until the new-agent wizard grows an edit mode. Adds shadcn alert-dialog primitive. Mock useAgentProfiles returns seven profiles spanning every status. * fix(agent-mcp-ui): honour prefers-reduced-motion + snapshot CockpitShell ref Adds a global prefers-reduced-motion: reduce media block in styles.css that collapses every animation and transition to a 0.01ms no-op for users with vestibular sensitivities, satisfying WCAG 2.3.3 across the cockpit (pulse-dot live indicators, sidebar expand, replay scrubber transition, in-app fade-ups). Refactors CockpitShell's unmount cleanup to snapshot the timeout ref object into a stable local before closing over it in the cleanup, which is the React docs' canonical pattern for refs in effects and resolves the missing-effect-dependencies warning. Behaviour is unchanged: the cleanup still clears whatever timeout id is current at unmount time. react-doctor score moves from 74/100 with 2 findings to 100/100 with 0 findings. * feat(agent-mcp-ui): mcp registry at /mcp Replaces the /mcp placeholder with the per-agent MCP endpoint registry. Reads every configured profile from useAgentProfiles and renders one card per profile: harness icon chip + name + harness, slug + CLI hint, status pill, the dark URL block with a copy button, and a Regenerate URL + Add to {harness} button pair. The Add CTA flips into a brief Added confirmation; the copy button flips into a check icon for 1.5s so the user knows the clipboard write landed. Adds useRegenerateMcpUrl mock mutation that rotates the slug and writes the new URL straight back into the agent-profiles cache via setQueryData, so the row reflects the new endpoint without a refetch. Shape mirrors the eventual hono-rpc surface. Skip the per-harness /mcp/setup-* helper screens for now per the running plan; we land them with the onboarding work. * fix(agent-mcp-ui): keep selected text readable on dark surfaces Global ::selection only set a background, so on light-on-dark surfaces like the MCP URL block the cream text disappeared into the accent-tint selection highlight. Pinning the foreground to ink keeps every selection readable: dark ink on light tint everywhere, including the live-run viewport caption and the MCP code block. * chore(browseros-agent): bump biome to 2.5.0 2.5.0 just cleared the bunfig release-age gate so the local install matches what CI's version: latest has already been pulling. Updates the package.json pin plus the schema references in the root biome.json and apps/agent-mcp-ui/biome.json. apps/agent-mcp-ui stays clean under both bun run lint and biome ci. The pre-existing diagnostics surfacing on apps/eval, apps/server, apps/agent, packages/shared, and scripts/dev are unchanged from before the bump. * feat(agent-mcp-ui): onboarding + governance permissions/site-rules/grants + edit wizard (#1223) * feat(agent-mcp-ui): first-launch onboarding flow at /onboarding Adds a four-step onboarding flow sitting full-bleed outside the CockpitShell. Left brand column carries the BrowserOS logo, a Newsreader-italic pull quote, and three value props (fast & token-cheap / logged in as you / under your control). Right column shows step dots up top and one of four step panels: Welcome (set up vs reconnect), Import Logins (Chrome-quit gate, profile picker with default Work + Personal selected, Keychain notice, progress card, summary), Connect to Claude (one-click add or copyable CLI fallback, success card), Ready (two starter prompts with copy buttons, Open BrowserOS CTA). Reconnect and Open BrowserOS both navigate to /. Adds useImportChromeSessions and useConnectToClaude mock mutations (react-query-kit createMutation) whose shape matches the eventual hono-rpc surfaces. CHROME_PROFILES seeds three profiles totalling 55 sites and 14 logins; STARTER_PROMPTS reuses the prompt strings already surfaced elsewhere in the cockpit. Skip first-launch gating for now: per the running plan's open questions, the where-does-the-flag-live decision lives with the backend SSE work. * feat(agent-mcp-ui): governance permissions, site rules, grants tabs Rounds out the governance hub. The three placeholder tabs now ship real surfaces: Permissions: read-only catalog of the six action categories grouped into the three buckets (Auto / Ask / Block) every new agent inherits from. Read straight from new-agent.schemas' APPROVAL_CATEGORIES so the wizard's default verdicts and the catalog can't drift. Layout is a three-column lg grid with verdict-coloured bucket cards. Site Rules: list of (label, domain, action) blocks the browser enforces directly. Each row carries a coloured action badge, the domain in mono, and a delete button. An inline 'Add a rule' form expands into a react-hook-form + zod editor with three fields (label, domain, action select) and submits through useAddSiteRule. setQueryData writes on both add and delete keep the list as the cache's source of truth. Grants: the always-allow ledger. Per-row action + domain + grantee + when + optional note + Revoke button. Revoke routes through a shadcn AlertDialog explaining the consequence (future attempts re-prompt, existing runs unaffected). useRevokeGrant's onSuccess drops the row from the cache. ComingSoonTab is removed; nothing imports it any more. * feat(agent-mcp-ui): edit-mode wizard at /agents/:id/edit + recent-activity replay link Closes the two loose ends called out in the running plan. The new-agent wizard now accepts an optional mode prop ('create' | 'edit'); /agents/:id/edit renders it with mode=edit. In edit mode the data hook reads the agent id from useParams, fetches the wizard-shape values via useAgentProfileDetail (a mock that synthesises full NewAgentValues from an AgentProfile summary), drives the form's reactive values prop, and routes submit through useUpdateAgent. The mutation's onSuccess patches the agent-profiles cache so the directory's row reflects the rename immediately. Header copy, submit CTA, pending label, and success card all flip to edit-mode strings ('Edit agent', 'Save changes to X', 'Saving…', 'X updated'); the copy-from-existing card is hidden in edit mode. Cockpit recent-activity now lands done rows on the replay route. Added an optional runId to the ActivityRow type and a new History-icon Replay button on done rows that links to /governance/audit/:runId/replay. The Codex . Log calls done row points at run-concur-may so the demo lands on real fixture data. * fix(agent-mcp-ui): keep AddSiteRuleForm mounted until mutation settles Previously the form ran close() synchronously after onSubmit, before addRule.mutate could resolve. Today the mock always succeeds; once a real backend lands, a 4xx would silently drop the row and lose the user's input. Widening the onSubmit prop to forward react-query-kit's mutation options lets the parent hand close to the mutation's onSuccess, so the form stays mounted on failure and a FormMessage can surface the error once we wire one in. * feat(agent-mcp-interface): phase 1 — agent profiles CRUD end-to-end (#1224) * chore(agent-mcp-interface): foundation for phase 1 Adds the storage helper Phase 1 leans on, plus the deps and env reads that helper + the upcoming agent routes need. env.ts now also exposes BROWSEROS_DIR overrides and an isDevelopment flag (still the only sanctioned process.env reader). src/lib/browseros-dir.ts resolves <homedir>/.browseros (or .browseros-dev under NODE_ENV=development) with the env override winning; the package writes everything under <browserosDir>/mcp-interface/. src/lib/storage.ts wraps readJson/writeJson/listFiles/removeFile/ensureDir/fileExists around the interface root, validates every read and write with a supplied zod schema, refuses absolute paths or .. escapes, and writes through a <name>.tmp rename so a mid-write crash leaves either prior contents or nothing. Adds bun test wiring + tests/_helpers/temp-browseros-dir.ts so every test gets an isolated tmp root. 13 storage tests pass. * feat(agent-mcp-interface): agent profile schemas + service schemas.ts is the wire contract the UI's typed client picks up via AppType. Mirrors the existing UI wizard shape (NewAgentValues) and adds the storage shape (server-managed id / slug / mcpUrl / status / timestamps) plus the directory projection used by GET / responses. service.ts wraps the storage helper with file-backed CRUD: one profile per file at <browserosDir>/mcp-interface/agents/<id>.json keyed by nanoid(8). Slug is the user-facing identifier and is uniqued across all profiles via uniqueSlug (which collides up to -99 before throwing). mcpUrl is recomputed from getLocalServerUrl on every read so a port change between boots doesn't strand the stored value. lib/slug.ts mirrors the UI's toSlug so wizard preview and persisted slug match. 15 service tests cover create / list / detail / update (rename + slug rotation, slug stability when name unchanged) / remove / regenerate / parallel updates. Plus the original 13 storage tests still pass. 28 tests total in 354ms. * feat(agent-mcp-interface): /agents CRUD routes Thin Hono layer over routes/agents/service.ts: zValidator rejects malformed bodies with structured 400s, missing-id paths surface 404 via HttpError, the rest just translate HTTP shape. Chained into server.ts via .route('/', agentsRoute) so AppType automatically picks up POST /agents, GET /agents, GET /agents/:id, PATCH /agents/:id, DELETE /agents/:id, POST /agents/:id/mcp-url:regenerate. Five route-level integration tests drive the typed client (hc<AppType>) against app.fetch with no real port bind, in an isolated tmp <browserosDir> per case. Covers the full lifecycle, every 404 path, the 400 zod path, slug collision through the route, and parallel updates of two profiles. The regenerate slug regex was tightened to allow nanoid-suffixed multi-hyphen slugs (toSlug normalises any _ in the nanoid output to -). 33 tests / 88 expect calls pass in under 100ms. * feat(agent-mcp-ui): swap six agent hooks for real client calls Replaces the in-memory mocks for useAgentProfiles, useAgentProfileDetail, useCreateAgent, useUpdateAgent, useDeleteAgent, and useRegenerateMcpUrl with hono-rpc calls through the existing client + parseResponse pair. Strips the MOCK_AGENT_PROFILES fixture, the profileToWizardValues synthesiser, the buildMcpUrl/toSlug/nanoid mock helpers, and the artificial setTimeout latencies — the cache surface seen by every consumer (Agents directory, new-agent wizard create + edit, MCP registry regenerate / delete dialogs) stays byte-identical because the wire types now flow from AppType and match what the UI already expected. useAgents (cockpit running grid) stays on its three-row MOCK_AGENTS fixture; that hook becomes Phase 4's projection over the runs store, called out in a top-of-file comment. UI typecheck, lint, lint:doctor all clean. react-doctor holds at 100/100. * fix(agent-mcp-interface): harden phase 1 against the three greptile findings Three independent fixes; reviewer suggestions on PR #1224 covered them all. 1. Storage path guard now inspects the raw input for '..' segments before normalize collapses them. 'agents/../config.json' previously normalized to 'config.json' and slipped past the rooted-prefix check, which would have let any future route forwarding a path-shaped id read or delete files at the mcp-interface/ root. Storage tests cover read/write/remove on a lateral-traversal path. 2. Service layer validates the id shape (matches the nanoid alphabet, length-capped) inside loadById and remove. Traversal-shaped ids on any read/write/delete path now resolve as not-found rather than reaching the storage layer. Service test exercises four evil ids across all four entry points. 3. loadAll uses Promise.allSettled + logger.warn instead of Promise.all so a single corrupt agent json (manual edit, partial migration, half-written file on a weird FS) gets logged + skipped rather than rejecting the whole call. Without this, one bad file would brick list and create until the user manually deleted it. Test writes a garbage file alongside a valid one and confirms list returns only the valid one + create still works. 4. AsyncMutex serialises create / update / regenerateMcpUrl so the read-snapshot → uniqueSlug → write window cannot race against itself. Closes the TOCTOU window where two concurrent same-name creates could both pass the uniqueness check and write the same slug. Reads stay lock-free. Mutex has its own unit tests (FIFO ordering, rejection doesn't block subsequent tasks). Service test fires 10 parallel creates with the same name and asserts 10 distinct slugs come back (race, race-2, ..., race-10). 39 tests / 125 expect calls pass. Lint + typecheck clean. * feat(agent-mcp-interface): site rules + permissions catalog + check api (#1231) * feat(agent-mcp-interface): add domain glob matcher + approval catalog seed * feat(agent-mcp-interface): file-backed site-rules service * feat(agent-mcp-interface): wire /site-rules and /permissions/catalog routes * feat(agent-mcp-interface): permissions.check api for executor pre-flight * feat(agent-mcp-ui): swap site-rules + permissions catalog hooks to real client * fix(agent-mcp-interface): enforce admin site rules + warn on catalog fallback * feat(agent-mcp-interface): per-agent MCP server + executor stub (#1232) * feat(agent-mcp-interface): browser executor interface + deterministic stub * feat(agent-mcp-interface): wire /mcp/:slug via MCP SDK web-standard transport * feat(agent-mcp-interface): permission gate + navigate tool through MCP * feat(agent-mcp-interface): add read, click, type, attach, submit tools * test(agent-mcp-interface): pin delete-agent-slug-404s-immediately invariant * fix(agent-mcp-interface): plug stub leak + reject non-http navigate + attach traversal * feat(agent-mcp-interface): live integration polish + agent-mcp-manager wiring (#1234) * fix(agent-mcp-ui): clone-from card uses real profiles and hydrates every field * fix(agent-mcp-ui): hide logins step until vault import lands * fix(agent-mcp-ui): pin new-agent rail CTA to viewport bottom * feat(agent-mcp-interface): wire agent-mcp-manager into create + delete * feat(agent-mcp-ui): surface real harness install outcome on the success card * feat(agent-mcp-ui): cover all agent-mcp-manager harnesses + shared HarnessIcon * feat(agent-mcp-ui): real brand marks via @svgl shadcn registry + drop BrowserOS pill * fix(agent-mcp-interface): reconcile harness link on update + regenerate * fix(agent-mcp-ui): invalidate profile caches after create/update/delete/regenerate * fix: handle clone-fetch failure + remove-before-uninstall on delete * feat(agent-mcp-interface): adopt real browser tools + mount inside apps/server (#1235) * feat(server): export browser tool surface + session for cockpit reuse * feat(agent-mcp-interface): adopt @browseros/server real tool catalogue with permission wrapper * feat(server): mount cockpit inside apps/server runtime with mcpUrl migration * test(agent-mcp-interface): pin migrateMcpUrls rewrite + re-install behavior * fix(server,eval): cast around workspace zod version cross-pollination * fix(cockpit): isolate uninstall + catch migration; reject non-http navigate; log run dispatch - migrate-mcp-urls: wrap uninstallForAgent in its own try/catch so a throw there does not skip installForAgent and leave the harness pointing at a dead URL while the profile JSON carries the new one. - cockpit.ts: add .catch on migrateMcpUrls so a top-level rejection (e.g. listFiles hitting EACCES) is logged instead of swallowed as an unhandled promise rejection. - mcp/register: reject javascript:, file:, and data: URLs at the navigate wrapper before the permission gate, restoring the defense-in-depth the old per-tool wrapper had. The real navigate tool's schema is z.string().optional() with no scheme check. - mcp/register: log a warning when the run tool dispatches. A dedicated catalog verb for arbitrary script execution is the proper fix; the log keeps dispatches auditable until that lands. - Integration test: lock the navigate scheme guard with explicit cases for javascript:, file:, and data:. * feat(cockpit-ui): confirm before rotating MCP URL; drop redundant Add-to-harness button The MCP page had two paper cuts: 1. Regenerate URL fired straight on click. Rotating destroys the previously-issued URL and re-installs the harness entry under a new slug, so anywhere the old URL was pasted by hand stops working. Now the button opens a shadcn AlertDialog explaining the impact (auto-reinstall via reconcileHarnessLink, but external paste-ins go dead) and only fires the mutation on confirm. Matches the pattern used by DeleteAgentDialog. 2. The "Add to <harness>" button only flipped a local "Added" badge for 1.8s; it never triggered an install because the install already ran when the agent was created. Removed the button and updated the header + empty-state copy to say so explicitly. * fix(cockpit): mount standalone server under /cockpit prefix The merged-runtime refactor switched the UI client and the harness install URLs to a single shape: `http://127.0.0.1:<port>/cockpit/...`. That works against `createCockpitRoutes` because apps/server mounts the cockpit under `.route('/cockpit', ...)`. The standalone entry point in `src/main.ts` was still serving at the root, so every UI request returned 404. Wraps the Hono `server` in a parent that mounts it under COCKPIT_MOUNT_PREFIX and updates `localServerUrl` to include the prefix. The buildMcpUrl helper now produces the same shape in both runtimes, so harness configs stay valid across a switch. Also runs `migrateMcpUrls` at standalone boot — same sweep the production factory does — so profiles created before this change get their stored mcpUrl + harness install entries rewritten to the new shape on first start. * fix(cockpit-ui): reconcile agent-profiles list after regenerate URL GET /agents is sorted by `updatedAt` DESC server-side, and regenerate bumps that field — so the rotated row needs to jump to the top of the directory and the MCP page. The previous handler only patched `mcpUrl` in place, leaving the sort order stale until the next page load. Swaps the detail-cache invalidation (the GET /agents/:id wire shape doesn't carry slug or mcpUrl, so the invalidation was a no-op) for a list-cache invalidation, while keeping the optimistic `mcpUrl` patch so the new URL still appears without a network round-trip. * feat(agent-mcp-interface): attach to browseros browser over CDP at boot (#1248) * feat(agent-mcp-interface): attach to browseros browser over cdp at boot The standalone cockpit ran the route surface but never set the process-wide BrowserSession, so every MCP tools/call short-circuited with "browser session not connected". Production worked because the merged runtime in @browseros/server called createCockpitRoutes with its live session; standalone had no such hand-off. Mirrors @browseros/server's bootstrap directly: connect a CdpBackend to the configured port, wrap in Browser, hand the session to setBrowserSession at boot. Configurable via the BROWSEROS_COCKPIT_CDP_PORT env var, defaults to 49337 (IANA dynamic / private range, no known collision with registered services). Soft-fails when the browser is not reachable. The cockpit still serves the UI, profile CRUD, harness installs, and tools/list; only tools/call keeps the existing "session not connected" wire shape until the user restarts the cockpit with the browser up. exitOnReconnectFailure: false on the CdpBackend so a transient drop degrades the session instead of killing the cockpit process. CdpClient is injected through BrowserBootstrapDeps so the unit test covers the connect-success, connect-fail, and disconnect-swallows-errors paths without opening a socket. * fix(cockpit): guard signal handler + harden DI shape Three small cleanups from the PR review: - Add an `exiting` guard around the SIGINT/SIGTERM cleanup so a back-to-back delivery (supervisor sends both) does not restart `disconnect()` on an already-closing CDP connection. - Add a `setTimeout(() => process.exit(1), 5000).unref()` kill switch before `disconnect()` so a hung inner `cdp.disconnect()` (half-open socket, network stall) cannot leave the process unkillable except via SIGKILL. - Replace `BrowserBootstrapDeps`'s two independent override fields with a single bundled `inject` object so callers cannot mix a stub `cdpFactory` with the default `buildSession`. The default `buildSession` casts to a real `CdpBackend`, so a partial override would compile but blow up at the first `Browser` call. Tests updated accordingly. * feat(agent-mcp-ui): single source for the MCP endpoint URL shown in agent create/edit (#1328) The wizard, the edit flow, and the MCP directory each constructed the per-agent MCP URL inline. The wizard's pre-save preview returned http://127.0.0.1:9000/mcp/<slug>, which is wrong on both axes: port 9000 is the CDP socket, and the cockpit lives under /cockpit while it borrows apps/server's BrowserSession. Introduce modules/api/mcp-endpoint as the canonical source. The copy widget on /agents/new and /agents/:id/edit, the McpRow in the directory, and the slug parser all flow through it. A TODO at the top of the module marks the temporary cockpit-mount target and lists the three commits the future unmount will need. * feat(cockpit): tab activity registry + homepage feedback loop (PR 1/3) (#1331) * feat(agent-mcp-interface): tab activity registry + GET /tabs/activity route Wraps the existing executeTool dispatch in mcp/register.ts so every successful browser-tool call is recorded against the calling agent and the targeted CDP target id. Failed dispatches and tools without a page arg (tab_groups, windows, run) are skipped. Records live in an in-memory map keyed by targetId; status is derived at read time (active for 5s after the last tool, idle afterwards). Closed tabs are evicted lazily when the next snapshot read finds the pageId no longer maps to the original targetId (pageIds are reused after close). The GET /tabs/activity route surfaces the current snapshot and is mounted into the AppType chain so the UI hono-rpc client picks it up automatically. No server-package edits anywhere; the cockpit reads PageManager via the shared BrowserSession singleton it already owns. * feat(agent-mcp-ui): drive cockpit homepage from real tab-activity polling useTabsActivity polls GET /cockpit/tabs/activity every 1500ms via the existing hono-rpc client; cockpit.data composes that with the existing mocked approvals/handoffs so the screen calls one hook only. Active records become RunningGrid cards (status=running); idle records become RecentActivity rows (status=done) with a relative-time string. Helper file derives a stable per-slug color, parses the site from the URL, and formats the relative timestamp. Mocked useAgents / useRecentActivity stay in place for any other surface that imports them; the homepage just stops consuming them. * fix(cockpit): test-isolation clear() + honest isPending + harness TODO - TabActivityRegistry gains a clear() escape hatch next to size(); the routes/tabs test now calls it in afterEach so a stale record from one test cannot surface in another that re-attaches a session. - useCockpitData isPending now OR-combines tabs/approvals/handoffs so any future caller wiring a spinner sees the actual loading state. - tabsToAgentRows annotates the hardcoded harness with a TODO pointing at the PR-3 profile join so the simplification stays visible. * fix(eval): restore any-typing on ai-sdk onStepFinish params for mixed-zod workspace CI typecheck on the merge commit failed in the eval package: the onStepFinish destructure inherited from main resolves to implicit-any under this branch's workspace where the cockpit pins zod v4 and the server pins zod v3, so the ai-sdk generate() option type widens. Re- introduce the explicit `any` annotations + biome-ignore comments that the branch carried before the merge took main's cleaner shape; the runtime contract is unchanged. Single-agent.ts: `onStepFinish: async (step: any)` with biome-ignore. Tool-loop-executor-backend.ts: typed destructure literal with two biome-ignores on the toolCalls and toolResults any fields. --------- Co-authored-by: shivammittal274 <56757235+shivammittal274@users.noreply.github.com>
Summary
First of three PRs from the cockpit-homepage plan. Every successful browser-tool dispatch the cockpit observes is recorded against the calling agent and the targeted CDP target id; the homepage polls
GET /cockpit/tabs/activityevery 1500 ms and renders the live view. No takeover gate yet (PR 2), no UI redesign beyond rewiring the data hooks. Zero diffs inapps/server.What changes
apps/agent-mcp-interface:lib/tab-activity/extract-page-id.tsis a pure helper that readsrawArgs.pagefor tools that accept it (act,diff,download,evaluate,grep,navigate,pdf,read,screenshot,snapshot,tabs,upload,wait) and rejects non-integer / non-positive values. Tools without apage(tab_groups,windows,run) yieldnull.lib/tab-activity/registry.tsis the in-memory map keyed by stable CDPtargetId.recordToolwrites;snapshot()reads.statusis derived at read time (activewithin 5 s of the last tool,idleafterwards). Closed tabs are evicted lazily on the next snapshot read by checkingPageManager.getInfo(pageId)?.targetId === storedTargetId(pageIds are reused, target ids are not). The registry takes an optionalnowclock for deterministic tests.lib/tab-activity/index.tsexports a process-wide singleton bound to the existinggetBrowserSessionaccessor.mcp/register.tswraps the existingexecuteToolcall withrecordSuccessfulDispatch(...). Failed dispatches and tools without apagearg are skipped. The new helper is extracted so the dispatcher function stays under biome's cognitive-complexity threshold.routes/tabs/index.tsexposesGET /tabs/activityreturning{ tabs: TabActivityRecord[] }. Mounted inserver.tsso the chainedAppTypepicks it up automatically.apps/agent-mcp-ui:modules/api/tabs.hooks.tsdefinesuseTabsActivitywithreact-query-kit, polling every 1500 ms via the existing hono-rpc client.screens/cockpit/cockpit.helpers.tsderives per-slug colours, parses the site from the URL, formats relative time, and mapsTabActivityRecord[]toAgentRow[](active records,status=running) andActivityRow[](idle records,status=done).screens/cockpit/cockpit.data.tsis the new aggregator hook. The homepage screen calls it and nothing else.Cockpit.tsxdrops the mockeduseAgents/useRecentActivitycalls and consumesuseCockpitData()instead. Approvals + handoffs stay on their mocks until later PRs supply them.Server-side guarantees
No edits to
apps/server/src/tools/browser/framework.ts, no edits toexecuteTool, no edits toBROWSER_TOOLS. The cockpit readsPageManagervia the sharedBrowserSessionsingleton it already owns.Test plan
extractPageId(8 cases): every tool with a page, every tool without, missing / non-integer / non-object args.TabActivityRegistry(9 cases): record + snapshot, dedup-by-targetId, status timing with injectable clock, eviction on pageId reuse + outright deletion, no-session no-op, multi-tab independence, sort bylastToolAtdesc, last-write-wins on shared target./tabs/activityroute: empty-state shape + populated shape after recording.apps/agent-mcp-interfacetests still pass.bunx tsc --noEmitclean in both packages.bunx biome ciclean.Manual walk-through against a live BrowserOS (reviewer action):
apps/serveragainst a running BrowserOS browser (CDP up).RunningGridis empty.tools/call navigate { page: 1, url: 'https://example.com' }against/cockpit/mcp/<slug>for any agent profile.curl http://127.0.0.1:9100/cockpit/tabs/activityshould now show a record withagentId,slug,targetId,pageId: 1,url: 'https://example.com/',title: 'Example Domain',lastToolName: 'navigate',status: 'active'.RunningGridshould show one card within 2 seconds, labelled with the slug.RecentActivitywith statusdoneand a relative-time chip./tabs/activityon the next read (caught by the eviction-on-read pass).navigate { page: 999, url: '...' }). No new record appears, the previous record'slastToolAtdoes NOT advance.Follow-ups (PR 2 and PR 3)
TakeoverRegistry,POST /cockpit/tabs/:targetId/takeover+/release, and an MCP-error gate in the same wrapper.WaitingStrip.