lijunzh · lijunzh · May 10, 2026 · May 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
 
 ### Changed (BREAKING for sub-agent dispatch shape)
 
+- **`InvokeAgent` / `SpawnAgent` are now synchronous tool delegations — bg-spawn dispatch path retired** (#1366 phase 1). Dispatch now drives the sub-agent's inference loop inline on the parent's task and returns the sub-agent's final answer as the `ToolCallResult.output`, the same shape as every other tool. Pre-#1366 (i.e., post-#1163) every InvokeAgent call was `tokio::spawn`'d into a `run_bg_agent` task that returned a `task_id` immediately; results reached the model via auto-drain on a future iteration or via an explicit `WaitForMail` call against the mailbox bridge from #1336. That whole async surface is gone from the public dispatch path: the `inline_only=true` recursion-guard branch in `execute_sub_agent` is now the only public path; the `inline_only=false` bg-spawn block (and `run_bg_agent`, `BgRegistry`, `ChildAgentActivity` event family, mailbox infrastructure, activity-pill UI, child-activity overlay, `WaitForMail` tool) becomes dead from the public surface and will be deleted across phases 2–5 of #1366. Architectural rationale (DESIGN.md "Execution Model" section, post-#1366): the LLM API caps throughput, not concurrency, so async dispatch buys little while costing a lot of surface area; sub-agents' real value is context isolation + tool restriction + specialization, all of which work fine with sync delegation. Codex's ExecCell and gemini-CLI's `SubagentGroupDisplay` patterns inspired the swap. Deliberately NO runtime config knob — per DESIGN.md P1 ("compile decisions in, don't configure them at runtime"), the sync swap is the only behavior; a previous draft that gated it behind `KODA_SUBAGENT_SYNC` was rejected as a principle violation. The `tool_dispatch.rs` change is two lines: `inline_only=false` → `true` at the InvokeAgent and SpawnAgent callsites. The InvokeAgent tool description and module-level rustdoc are rewritten to reflect sync semantics: drops the "Spawn a sub-agent in the background" / "Returns IMMEDIATELY with a task_id" framing in favor of "Delegate a task and get its final answer back as the tool result. The call BLOCKS until the sub-agent finishes." The `[ERROR: sub-agent` marker convention is preserved AND extended to recognize the dispatcher's `Err` arm format (`"Error invoking sub-agent: {chain}"`) as the second structural-failure shape. Schema unchanged — `agent_name` + `prompt` required, no `background` param (the regression guard from #1163 still passes). Tests: 6 e2e cases in `e2e_agent_test.rs` migrated from `collect_bg_events_after(...).find_map(AgentStatus::Completed { summary }/...)` to direct `events.iter().find_map(EngineEvent::ToolCallResult { name == "InvokeAgent", output })` — the pre-#1163 wait-free assertion shape returns, now without the `tokio::spawn` race window. Two e2e cases in `e2e_sub_agent_trust_test.rs` lose their `#1321/#1323/#1327` deflake waits because the race they guarded against (parent-loop termination vs bg-agent completion) does not exist on the sync path. The `bg_agent_iter_counter_advances_via_status_channel` test is deleted entirely — it pinned the `Running { iter: n }` heartbeat from the bg path, which is now unreachable from the public dispatcher; phase 2 of #1366 will delete the underlying code, at which point the test would have been dead anyway. The 4 description-pinning tests in `tools/agent.rs` are rewritten: `test_invoke_agent_description_documents_spawn_only_model` becomes `test_invoke_agent_description_documents_sync_dispatch_model` and asserts on `synchronously` / `BLOCKS` / `final answer` while explicitly forbidding `task_id` / `WaitForMail` / `auto-drain` / `in the background` (a regression that re-introduces them would mislead the model about dispatch shape); `test_invoke_agent_top_level_documents_drain_semantics` and `test_invoke_agent_description_discourages_immediate_wait` are deleted (drain semantics and the WaitForMail anti-pattern only existed because of bg-spawn). Wire format unchanged — ACP / headless clients are unaffected; sub-agent `EngineEvent::SubAgentStart` still fires at dispatch time. Verified: 1394 koda-core lib tests pass (−3: env-var test never created + 1 description test deleted + 1 description test deleted; −1 from the deleted bg_agent_iter_counter test that lived in the integration suite, not lib), 656 koda-cli lib tests pass, all `koda-core` integration suites pass, `cargo clippy --workspace --lib --tests --all-features` clean, `cargo fmt --check` clean.
+
 - **Background-task management trio retired — `WaitTask` / `ListBackgroundTasks` / `CancelTask` removed from the LLM tool surface** (#1325 Phase 5b). The three tools added in #996 Layer 2 are superseded by `WaitForMail` plus the mailbox bridge that landed in #1336: `notify_parent_mailbox` fires from `run_bg_agent` the moment any bg-agent exits, so the parent's mailbox watch-sequence increments before the existing drain-injection path adds the `Role::Tool` row — the model now needs exactly one tool to block on background work, and that tool unifies cleanly with the peer-messaging surface from #1325 Phase 3 (`SendMessage` / `WaitForMail`). Listing in-flight tasks and explicit per-task cancellation never had Codex/Claude-Code/Gemini-CLI equivalents and were a koda-specific surface from the pre-mailbox era — dropping them removes ~1,500 LOC (the 1,397-line `bg_task_tools.rs` minus the ~95 lines of `TaskId` / `parse_task_id` that moved to a new `tools/task_id.rs` module so the TUI's `/cancel <id>` slash command keeps working). The early-dispatch branch in `tool_dispatch::execute_one_tool` that routed the trio around `ToolRegistry::execute()` (it needed `Arc<ChildAgentRegistry>` plus the caller's spawner identity) is also gone, simplifying the dispatch path. `META_TOOLS` in `skill_scope` and `CANONICAL` in `tool_normalize` lost the trio; in their place are the post-#1325 meta tools (`SpawnAgent` / `SendMessage` / `WaitForMail`). The five snake_case aliases (`list_background_tasks`, `list_bg_tasks`, `cancel_task`, `wait_task`, `wait_for_task`) are removed too, so a model emitting any of those names now hits the unknown-tool fallback — the correct signal that the tool is gone. The `InvokeAgent` tool description and module-level rustdoc are rewritten to point at `WaitForMail` instead of `WaitTask` (a regression test inverts the assertion: the description must reference `WaitForMail` AND must NOT reference `WaitTask`, so a future copy-paste can't silently re-introduce the dead pointer). Test migration: `e2e_agent_test::test_sub_agent_cache_hit_skips_llm` switched its barrier between two `InvokeAgent` calls from `WaitTask({"task_ids":["agent:1"]})` to `WaitForMail({})` — the bg-completion mail from #1336's bridge fires AFTER `sub_agent_cache.put`, so `WaitForMail` gives the same happens-before edge that pinned the cache hit before. Test counts: `BUILTIN_TOOLS` 25→22, `tool_wiring_test::HIGHER_LAYER_DISPATCH` is now empty (the bypass list existed exclusively for the bg-task trio). The TUI rendering for historical `WaitTask` / `ListBackgroundTasks` rows in `wait_task_format.rs` (534 lines), `tui_render.rs`, and `history_render.rs` stays put — pre-5b session DBs persisted on disk still contain rows with these tool names and resuming an old session must render them correctly. The underlying registries (`ChildAgentRegistry`, `BgRegistry`, `ChildTaskSnapshot`) are untouched: they still drive the TUI `child_activity_overlay`, the `/agents` and `/cancel` slash commands, and the `notify_parent_mailbox` walk on bg-agent exit — Phase 5b is a tool-surface retirement, not a runtime simplification. DESIGN.md updated: the layer-history of #996 marks Layers 2 and G retired with forward-pointers to the mailbox surface; the "considered and rejected: Codex collab v2" entry rewritten to "considered and selectively adopted" reflecting that #1325 Phases 1–5 actually shipped the mailbox half (Phases 1–2 vendored Codex's `mailbox.rs` substrate, Phase 3 added the peer tools, Phase 4 added per-agent paths and a registry, Phase 5a added `SpawnAgent` plus the completion bridge, Phase 5b is this PR). What's still rejected: `close_agent` / `resume_agent` and persistent ThreadId-addressed agents — koda agents stay session-scoped, no cross-session resume. Net adopted footprint is ~1,500 LOC, half the original 3,000-LOC estimate, because the mailbox machinery is shared between peer messaging and bg-completion delivery. Verified: 1387 `koda-core` lib tests pass (−25 from the deleted `bg_task_tools.rs` test mod), 634 `koda-cli` lib tests pass, all `koda-core` integration suites pass, `cargo clippy --workspace --lib --tests --all-features -- -D warnings` clean.
 
 - **`InvokeAgent` is now spawn-only — the `background:bool` parameter is gone** (#1163, Lean A). Pre-#1163 the model picked between foreground (blocking, returns the sub-agent's final output as the tool result) and background (returns a task_id immediately, result auto-injects on a future iteration) by setting a required `background` boolean on every `InvokeAgent` call. The schema marked it required and the runtime parser rejected missing/wrong-type values, but model compliance with `required` is best-effort and the failure mode of "models silently default to `background:false` and serialize parallel fan-out into blocking calls" was the actual issue that opened #1232: a session showed 10/10 sub-agent calls blocking sequentially even when the model's plan had explicit parallel intent. Lean A collapses dispatch to a single shape that matches Codex's `spawn_agent` and Claude Code's `TaskCreate` — every `InvokeAgent` call spawns a `tokio::spawn`-ed sub-agent and returns its `task_id` synchronously; results auto-drain as user messages on a future parent iteration; `WaitTask([task_id, ...])` is reserved for the rare case where the parent has run out of useful concurrent work AND the next step strictly depends on the sub-agent output. Two semantic fixes fell out of the refactor for free: `SubAgentStart` events now fire on the parent's sink at dispatch time (pre-#1163 the bg path emitted only an `Info { "\u{1f680} ... launched in background" }` line and ACP / headless / e2e clients watching for `SubAgentStart` saw nothing on bg dispatches), and the sub-agent result cache check moved above the spawn block so cache hits short-circuit without spawning a `tokio` task or eating a registry slot. Internals: `execute_sub_agent` gained a private `inline_only:bool` recursion-guard parameter (used by `run_bg_agent` to drive the inference loop inside its own spawned task without re-entering the spawn path); the `parse_background_required` validator and its 7-test mod block were deleted. Test rework: 5 e2e tests in `koda-core/tests/e2e_agent_test.rs` migrated from fg-path-only event surfaces (raw `Info`, `ToolCallResult.output` containing sub-agent text, raw `ToolCallStart{is_sub_agent:true}`) to the post-#1163 surfaces (`AgentStatus::Completed.summary` for sub-agent output, `ChildAgentActivity::ToolStart` for forwarded sub-agent tool counts, `ChildAgentActivity::Info` for forwarded info lines). The `invoke_agent_and_take_calls` test helper now polls for terminal bg status before reading recorded provider calls (pre-#1163 the inline path completed inside `run_inference`, so reads were race-free "by accident"). DESIGN.md, the `child_agent.rs` / `session.rs` / `bg_task_tools.rs` module docs, the `ListBackgroundTasks` model-facing tool description, and the `/agents` TUI command doc all updated for the spawn-only mental model. Total: 1332 lib tests + 12 e2e tests + 634 koda-cli tests green; `cargo clippy --workspace -- -D warnings` clean.

diff --git a/koda-core/src/tool_dispatch.rs b/koda-core/src/tool_dispatch.rs
@@ -306,23 +306,37 @@ pub(crate) async fn execute_one_tool(
             // Phase 5 PR-4 of #934: hand the parent's effective policy
             // to the child so `compose()` can stack restrictions.
             &policy,
-            // Layer 4 of #996 + #1076: foreground sub-agents are not
-            // tracked in the bg-agent registry, so there is no
-            // `ChildStatusEmitter` to fan out per-iteration heartbeats
-            // to. Pass `None` to skip the per-iteration emit.
+            // Layer 4 of #996 + #1076: sync sub-agents are not tracked
+            // in the bg-agent registry. Pass `None` to skip the per-
+            // iteration emit — the FG path at sub_agent_dispatch.rs
+            // documents this as deliberate. Live activity flows
+            // through `ChildAgentActivity` events on the parent's
+            // `EngineSink` instead.
             None,
-            // #1108 P2a: pass the InvokeAgent tool_call_id so any
-            // bg-sub-agent reservation can record it. The drain
-            // hook in the inference loop will persist the bg sub-
-            // agent's narrative trace to `session_events` keyed by
-            // this id, so the transcript renderer can fold it under
-            // the parent's `InvokeAgent` tool result.
+            // #1108 P2a: pass the InvokeAgent tool_call_id so the
+            // narrative trace rows are folded under the parent's
+            // `InvokeAgent` tool result in the persisted transcript.
             Some(&tc.id),
-            // **#1163 (Lean A)**: `inline_only=false` is the public
-            // dispatch path — every InvokeAgent call spawns a bg
-            // task and returns its task_id immediately. The `true`
-            // value is reserved for `run_bg_agent`'s recursion guard.
-            false,
+            // **#1366 phase 1**: sub-agents are now synchronous
+            // delegations. `inline_only=true` selects the FG path that
+            // drives the sub-agent's inference loop inline and returns
+            // its final output as the tool result — no more bg-spawn-
+            // and-poll round-trip via task_id + WaitForMail.
+            //
+            // Architectural rationale: the LLM API caps throughput,
+            // not concurrency, so async dispatch buys little while
+            // costing a lot of surface area (mailbox / WaitForMail /
+            // pill / overlay). Sub-agents' real value is context
+            // isolation + tool restriction + specialization, all of
+            // which work fine with sync delegation.
+            //
+            // Pre-#1366 this was `false` (bg-spawn) and `true` was
+            // reserved for `run_bg_agent`'s recursion guard. With
+            // bg-spawn dead from the public surface, `true` IS the
+            // public path; phase 2 deletes the now-unreachable bg
+            // branch + mailbox + activity-pill UI in one large
+            // deletion PR.
+            true,
         );
         match Box::pin(fut).await {
             Ok(output) => (output, true, None),
@@ -389,7 +403,10 @@ pub(crate) async fn execute_one_tool(
             &policy,
             None,
             Some(&remapped_tc.id),
-            false,
+            // **#1366 phase 1**: SpawnAgent is the codex-v2 surface
+            // alias for InvokeAgent; both paths share the sync swap.
+            // See InvokeAgent branch above for the full rationale.
+            true,
         );
         match Box::pin(fut).await {
             Ok(output) => (output, true, None),