You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parallel multi-session is blocked by design — land the #1474 self-healing stack + make the opt-in broker path reachable (SSOT #1359 / audit #1457 alignment) #1480
Investigation context: openchrome-mcp1.12.7 (npm latest), Node v20.19.6, macOS (Darwin 25.3.0, arm64), MCP host = Claude Code with the global registration openchrome serve --auto-launch (stdio, default port 9222, default profile ~/.openchrome/profile). This issue consolidates a root-cause analysis with a direction/SSOT alignment review of the fixes already in flight, and catalogs additional wiring gaps found during the investigation. It is meant to sit under the SSOT (#1359) / audit (#1457) umbrella and to act as the tracking issue for the #1474 stack + follow-ups.
TL;DR
Parallel multi-session is structurally blocked, by design — not a regression. Since Stabilize shared-profile controller ownership #1376 introduced the per-(port, userDataDir) controller-owner lock, N identically-registered sessions resolve to 1 working owner + (N−1) hard failures surfaced to the host as a bare Failed to reconnect to openchrome: -32000.
A second serve --auto-launch against the same key prints a rich remediation to stderr and exit(2)s before the MCP handshake — so the host discards it and the user only ever sees -32000.
2. Root cause (two-part)
Default serve --auto-launch is single-owner-per-(port, userDataDir) with no broker auto-attach fallback.acquireControllerLock() (src/utils/controller-lock.ts) creates ~/.openchrome/locks/<key>.json with openSync(..., 'wx'); the second process gets EEXIST → DuplicateControllerError → refuse to start. With N identical registrations, N−1 are structurally guaranteed to fail.
The lock treats "owner PID alive" as "owner healthy." No CDP-reachability probe, no heartbeat/lease/TTL on the default owner. A half-zombie owner holds the lock forever; the orphaned managed Chrome (and Tier-3 headed-fallback children on basePort+100, e.g. 9322, plus headless 9666) can linger un-reaped.
Code anchors: src/utils/controller-lock.ts, src/utils/duplicate-controller-diagnostics.ts, src/chrome/launcher.ts (SingletonLock), src/chrome/process-watchdog.ts, src/chrome/headed-fallback.ts, src/chrome/auto-connect.ts (note: explicitly refuses to attach to the managed ~/.openchrome/profile, so the managed profile always takes launch mode).
3. Why this is by-design (not a regression)
Stabilize shared-profile controller ownership #1376 "Stabilize shared-profile controller ownership" (merged 2026-05-27) added the owner lock on purpose: "independent MCP processes pointing at the same (port, userDataDir) can race on target cleanup/reconnect and cause stale sessions or MCP disconnects." The lock converts flaky concurrent into deterministic single-owner.
The in-code remediation is explicit: "Refusing to start a second direct controller … use the future broker/shared-owner topology when available."
Implication for the fix direction: the SSOT-aligned move is not "make broker the default." It is "make the default single-owner path self-healing + observable, and make the opt-in broker path reachable, portable, and documented."
Issue #1474 ("Parallel sessions deadlock on controller lock; host sees only -32000", OPEN) is the canonical bug. Three stacked PRs (all OPEN, all MERGEABLE, base develop):
acquireControllerLockWithHealthCheck(): on live-owner collision, probe owner CDP /json/version; if unreachable past a boot-grace window, atomically take over the stale lock. Healthy owner never evicted. src/index.ts awaits it on --auto-launch.
owner-self-release.ts: on terminal watchdog-exhausted, release the lock and exit 70 so the host respawns a fresh owner. Anti-flap: only the terminal event surrenders ownership; chrome-died/single relaunch-failed do not.
duplicate-controller-error-server.ts: a degraded stdio responder that completes initialize then surfaces remediation via portable MCP surfaces — notifications/message, a diagnostic tool, and a structured JSON-RPC error (data: port, profile, owner pid, lock path, ordered remediations) — instead of bare -32000.
SSOT #1359 = "host-neutral MCP browser harness for real Chrome; the MCP protocol is the product boundary; no hidden host-specific behavior." Audit #1457 = "direction adherence high; achievement gated by primitives built, wiring/enforcement missing; + develop↔main divergence (Pillar B stack on main, absent on develop)."
Turns Pillar-B safety primitive (controller-lock) from advisory/deadlock-prone into enforcing + self-healing. Exactly the audit's "wire the primitive to the real call path" remedy. Preserves single-owner invariant (no split-brain via boot-grace + multi-probe). Does not change the opt-in default → respects D3.
Same Pillar-B reliability axis; makes lifecycle facts truthful (a dead owner stops claiming ownership). Anti-flap guard keeps it conservative. No new host-specific behavior.
Moves the failure story from discarded stderr onto portable MCP surfaces (notification + tool + structured error data). This is the literal product boundary of #1359: "a feature belongs in OpenChrome only if exposed through portable MCP surfaces." Best alignment of the three.
Keeping broker opt-in (no default change)
✅ Required
D3 froze broker as opt-in; default session exempt from TTL. Auto-electing broker in the default path would contradict the recorded decision and the "no hidden behavior" non-identity.
Targeting develop
✅ Correct, but verify divergence
#1457 M0 flagged the Pillar-B stack as main-only / absent on develop; back-merge #1455 has since merged (2026-05-29). Confirm src/utils/controller-lock.ts et al. now exist on develop so the stack rebases cleanly and doesn't reintroduce divergence.
Verdict: the #1474 stack is on-direction and SSOT-consistent. It fixes (a) the deadlock and (b) the observability gap without disturbing the opt-in broker policy. It is the "enforcement/wiring" the audit asked for, applied to Pillar B. Recommendation: merge #1477 → #1478 → #1479 as-is.
6. Remaining gaps NOT covered by the #1474 stack (newly found)
The #1474 stack makes the default path safe and legible, and auto-recovers a dead owner — but it still leaves (N−1) concurrent sessions non-functional (they get a clear error instead of a cryptic one). For hosts that genuinely want concurrent sessions, the opt-in broker path must be reachable, portable, and documented. These are "wiring/enforcement missing" items in the #1457 sense:
G1 — Broker flags are hidden in the user-facing CLI (discoverability/Pillar A).node dist/index.js serve --help lists --broker/--connect-broker, but the bin wrapperopenchrome serve --help does not (the --pilot, --hybrid, etc. show, broker does not). The one documented escape hatch is invisible at the surface most operators read. → Reconcile the two help surfaces.
G2 — Orphan Chrome leak on owner death (Pillar B isolation/cleanup). When an owner dies/half-zombies, Tier-3 headed-fallback children (basePort+100, e.g. 9322) and headless instances (e.g. 9666) can survive un-reaped; reap-orphans does not collect them. → Extend reaper + watchdog teardown to cover fallback/headless descendants and verify owner-self-release (fix(reliability): release controller lock on irrecoverable Chrome death (#1474) #1478) triggers full child cleanup.
G3 — No portable host-registration recipe for concurrent sharing (Pillar A). Today an operator must hand-author: one serve --auto-launch --broker owner first, then switch every session's registration to serve --connect-broker. --connect-brokerexit(2)s if no broker is published (no auto-start), and the chicken-and-egg/SPOF are undocumented. → Have mcp-client-config emit a broker-topology registration (owner + client) and document the recipe in README; surface the "no broker found — start one with …" hint over MCP (per feat(observability): surface DuplicateController remediation over MCP instead of bare -32000 (#1474) #1479's pattern), not just stderr.
Phase 4 (optional, separate decision) — explicit --auto-broker opt-in. G4, only if D3 is revisited to permit a convenience mode; ship behind a flag + trust config, default unchanged.
Documented broker-topology registration enables ≥2 sessions to operate concurrently against one shared Chrome via --connect-broker. (G3)
After owner death, no orphan Chrome (managed, headed-fallback, or headless) remains. (G2)
Default --auto-launch behavior and broker opt-in policy are unchanged unless an explicit new flag is introduced. (D3 compliance)
9. Evidence appendix (investigation machine)
Live owner lock: ~/.openchrome/locks/port-9222-...profile.json → pid 24035, v1.12.7, stdio/auto.
Managed Chrome: pid 95492 --remote-debugging-port=9222 --user-data-dir=~/.openchrome/profile (+ watchdog pid 95493 that SIGKILLs Chrome on owner death).
Orphans observed: pid 805 (headed-fallback, port 9322), pid 32230 (--headless=new, port 9666) — un-reaped (relates to G2).
auto-connect.ts boundary confirmed: refuses to attach to the managed profile → managed profile always launch-mode (relevant to why no implicit sharing exists today).
TL;DR
(port, userDataDir)controller-owner lock, N identically-registered sessions resolve to 1 working owner + (N−1) hard failures surfaced to the host as a bareFailed to reconnect to openchrome: -32000.--broker/--connect-broker, Introduce broker discovery and stdio proxy #1379) is opt-in by recorded decision (roadmap docs(roadmap): resolve SSOT open questions — graduation gate, VSL name, broker policy (#1457 PR-9) #1463 D3), so making the broker the default would contradict the SSOT.develop, then close the remaining wiring/discoverability gaps (G1–G4) so concurrent hosts have a reachable, documented, portable path — without changing the opt-in default that D3 froze.1. Symptom & deterministic repro
With ≥2 concurrent host sessions sharing one global registration (
openchrome serve --auto-launch, same port+profile):Failed to reconnect to openchrome: -32000.Captured on the investigation machine (one healthy owner holding the lock):
A second
serve --auto-launchagainst the same key prints a rich remediation to stderr andexit(2)s before the MCP handshake — so the host discards it and the user only ever sees-32000.2. Root cause (two-part)
serve --auto-launchis single-owner-per-(port, userDataDir)with no broker auto-attach fallback.acquireControllerLock()(src/utils/controller-lock.ts) creates~/.openchrome/locks/<key>.jsonwithopenSync(..., 'wx'); the second process getsEEXIST→DuplicateControllerError→ refuse to start. With N identical registrations, N−1 are structurally guaranteed to fail.headed-fallbackchildren onbasePort+100, e.g. 9322, plus headless 9666) can linger un-reaped.Code anchors:
src/utils/controller-lock.ts,src/utils/duplicate-controller-diagnostics.ts,src/chrome/launcher.ts(SingletonLock),src/chrome/process-watchdog.ts,src/chrome/headed-fallback.ts,src/chrome/auto-connect.ts(note: explicitly refuses to attach to the managed~/.openchrome/profile, so the managed profile always takes launch mode).3. Why this is by-design (not a regression)
(port, userDataDir)can race on target cleanup/reconnect and cause stale sessions or MCP disconnects." The lock converts flaky concurrent into deterministic single-owner.--brokerHTTP owner +--connect-brokerstdio proxy, discovery under~/.openchrome/brokers).--broker/--connect-broker), discovery via~/.openchrome/brokers, sliding idle-TTL lease expiry (PR-3 feat(session): sliding idle-TTL target leases — reclaim crashed-client tabs (#1457 PR-3) #1460; default session exempt), multi-tenant requires explicit trust config.Implication for the fix direction: the SSOT-aligned move is not "make broker the default." It is "make the default single-owner path self-healing + observable, and make the opt-in broker path reachable, portable, and documented."
4. The fix already in flight — #1474 stack
Issue #1474 ("Parallel sessions deadlock on controller lock; host sees only -32000", OPEN) is the canonical bug. Three stacked PRs (all OPEN, all MERGEABLE, base
develop):developacquireControllerLockWithHealthCheck(): on live-owner collision, probe owner CDP/json/version; if unreachable past a boot-grace window, atomically take over the stale lock. Healthy owner never evicted.src/index.tsawaits it on--auto-launch.fix/1474-controller-lock-health-aware(#1477)owner-self-release.ts: on terminalwatchdog-exhausted, release the lock and exit 70 so the host respawns a fresh owner. Anti-flap: only the terminal event surrenders ownership;chrome-died/singlerelaunch-faileddo not.fix/1474-owner-self-release(#1478)duplicate-controller-error-server.ts: a degraded stdio responder that completesinitializethen surfaces remediation via portable MCP surfaces —notifications/message, a diagnostic tool, and a structured JSON-RPC error (data: port, profile, owner pid, lock path, ordered remediations) — instead of bare-32000.Stacking/merge order: #1477 → #1478 → #1479.
5. Alignment analysis vs SSOT #1359 / audit #1457 / roadmap #1463
SSOT #1359 = "host-neutral MCP browser harness for real Chrome; the MCP protocol is the product boundary; no hidden host-specific behavior." Audit #1457 = "direction adherence high; achievement gated by primitives built, wiring/enforcement missing; +
develop↔maindivergence (Pillar B stack onmain, absent ondevelop)."data). This is the literal product boundary of #1359: "a feature belongs in OpenChrome only if exposed through portable MCP surfaces." Best alignment of the three.developmain-only / absent ondevelop; back-merge #1455 has since merged (2026-05-29). Confirmsrc/utils/controller-lock.tset al. now exist ondevelopso the stack rebases cleanly and doesn't reintroduce divergence.Verdict: the #1474 stack is on-direction and SSOT-consistent. It fixes (a) the deadlock and (b) the observability gap without disturbing the opt-in broker policy. It is the "enforcement/wiring" the audit asked for, applied to Pillar B. Recommendation: merge #1477 → #1478 → #1479 as-is.
6. Remaining gaps NOT covered by the #1474 stack (newly found)
The #1474 stack makes the default path safe and legible, and auto-recovers a dead owner — but it still leaves (N−1) concurrent sessions non-functional (they get a clear error instead of a cryptic one). For hosts that genuinely want concurrent sessions, the opt-in broker path must be reachable, portable, and documented. These are "wiring/enforcement missing" items in the #1457 sense:
node dist/index.js serve --helplists--broker/--connect-broker, but the bin wrapperopenchrome serve --helpdoes not (the--pilot,--hybrid, etc. show, broker does not). The one documented escape hatch is invisible at the surface most operators read. → Reconcile the two help surfaces.headed-fallbackchildren (basePort+100, e.g. 9322) and headless instances (e.g. 9666) can survive un-reaped;reap-orphansdoes not collect them. → Extend reaper + watchdog teardown to cover fallback/headless descendants and verifyowner-self-release(fix(reliability): release controller lock on irrecoverable Chrome death (#1474) #1478) triggers full child cleanup.serve --auto-launch --brokerowner first, then switch every session's registration toserve --connect-broker.--connect-brokerexit(2)s if no broker is published (no auto-start), and the chicken-and-egg/SPOF are undocumented. → Havemcp-client-configemit a broker-topology registration (owner + client) and document the recipe in README; surface the "no broker found — start one with …" hint over MCP (per feat(observability): surface DuplicateController remediation over MCP instead of bare -32000 (#1474) #1479's pattern), not just stderr.--auto-broker), never as the default--auto-launchbehavior. Gate behind a flag + trust config; lease/TTL already exists (feat(session): sliding idle-TTL target leases — reclaim crashed-client tabs (#1457 PR-3) #1460). Treat as a separate proposal, not part of the Parallel sessions deadlock on controller lock: half-zombie owner holds lock forever, host sees only -32000 #1474 stack.7. Proposed action plan (phased)
develop. Confirm post-chore(release): back-merge main into develop (reconcile divergence) #1455developcarries the controller-lock/broker primitives so the stack applies cleanly. Outcome: no more permanent deadlock; no more bare-32000; half-zombie owners auto-recover.--auto-brokeropt-in. G4, only if D3 is revisited to permit a convenience mode; ship behind a flag + trust config, default unchanged.8. Acceptance criteria
-32000) on the non-owner sessions. (feat(observability): surface DuplicateController remediation over MCP instead of bare -32000 (#1474) #1479)openchrome serve --help(bin) lists--broker/--connect-broker. (G1)--connect-broker. (G3)--auto-launchbehavior and broker opt-in policy are unchanged unless an explicit new flag is introduced. (D3 compliance)9. Evidence appendix (investigation machine)
~/.openchrome/locks/port-9222-...profile.json→ pid 24035, v1.12.7, stdio/auto.--remote-debugging-port=9222 --user-data-dir=~/.openchrome/profile(+ watchdog pid 95493 that SIGKILLs Chrome on owner death).headed-fallback, port 9322), pid 32230 (--headless=new, port 9666) — un-reaped (relates to G2).auto-connect.tsboundary confirmed: refuses to attach to the managed profile → managed profile always launch-mode (relevant to why no implicit sharing exists today).Refs: SSOT #1359 · audit #1457 · back-merge #1455 · controller-lock #1376 · diagnostics #1377 · broker foundation #1379 · lease TTL #1460 · roadmap/D3 #1463 · deadlock bug #1474 · fixes #1477/#1478/#1479.