Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).

### Changed (BREAKING for sub-agent dispatch shape)

- **`InvokeAgent` / `SpawnAgent` are now synchronous tool delegations — bg-spawn dispatch path retired** (#1366 phase 1). Dispatch now drives the sub-agent's inference loop inline on the parent's task and returns the sub-agent's final answer as the `ToolCallResult.output`, the same shape as every other tool. Pre-#1366 (i.e., post-#1163) every InvokeAgent call was `tokio::spawn`'d into a `run_bg_agent` task that returned a `task_id` immediately; results reached the model via auto-drain on a future iteration or via an explicit `WaitForMail` call against the mailbox bridge from #1336. That whole async surface is gone from the public dispatch path: the `inline_only=true` recursion-guard branch in `execute_sub_agent` is now the only public path; the `inline_only=false` bg-spawn block (and `run_bg_agent`, `BgRegistry`, `ChildAgentActivity` event family, mailbox infrastructure, activity-pill UI, child-activity overlay, `WaitForMail` tool) becomes dead from the public surface and will be deleted across phases 2–5 of #1366. Architectural rationale (DESIGN.md "Execution Model" section, post-#1366): the LLM API caps throughput, not concurrency, so async dispatch buys little while costing a lot of surface area; sub-agents' real value is context isolation + tool restriction + specialization, all of which work fine with sync delegation. Codex's ExecCell and gemini-CLI's `SubagentGroupDisplay` patterns inspired the swap. Deliberately NO runtime config knob — per DESIGN.md P1 ("compile decisions in, don't configure them at runtime"), the sync swap is the only behavior; a previous draft that gated it behind `KODA_SUBAGENT_SYNC` was rejected as a principle violation. The `tool_dispatch.rs` change is two lines: `inline_only=false` → `true` at the InvokeAgent and SpawnAgent callsites. The InvokeAgent tool description and module-level rustdoc are rewritten to reflect sync semantics: drops the "Spawn a sub-agent in the background" / "Returns IMMEDIATELY with a task_id" framing in favor of "Delegate a task and get its final answer back as the tool result. The call BLOCKS until the sub-agent finishes." The `[ERROR: sub-agent` marker convention is preserved AND extended to recognize the dispatcher's `Err` arm format (`"Error invoking sub-agent: {chain}"`) as the second structural-failure shape. Schema unchanged — `agent_name` + `prompt` required, no `background` param (the regression guard from #1163 still passes). Tests: 6 e2e cases in `e2e_agent_test.rs` migrated from `collect_bg_events_after(...).find_map(AgentStatus::Completed { summary }/...)` to direct `events.iter().find_map(EngineEvent::ToolCallResult { name == "InvokeAgent", output })` — the pre-#1163 wait-free assertion shape returns, now without the `tokio::spawn` race window. Two e2e cases in `e2e_sub_agent_trust_test.rs` lose their `#1321/#1323/#1327` deflake waits because the race they guarded against (parent-loop termination vs bg-agent completion) does not exist on the sync path. The `bg_agent_iter_counter_advances_via_status_channel` test is deleted entirely — it pinned the `Running { iter: n }` heartbeat from the bg path, which is now unreachable from the public dispatcher; phase 2 of #1366 will delete the underlying code, at which point the test would have been dead anyway. The 4 description-pinning tests in `tools/agent.rs` are rewritten: `test_invoke_agent_description_documents_spawn_only_model` becomes `test_invoke_agent_description_documents_sync_dispatch_model` and asserts on `synchronously` / `BLOCKS` / `final answer` while explicitly forbidding `task_id` / `WaitForMail` / `auto-drain` / `in the background` (a regression that re-introduces them would mislead the model about dispatch shape); `test_invoke_agent_top_level_documents_drain_semantics` and `test_invoke_agent_description_discourages_immediate_wait` are deleted (drain semantics and the WaitForMail anti-pattern only existed because of bg-spawn). Wire format unchanged — ACP / headless clients are unaffected; sub-agent `EngineEvent::SubAgentStart` still fires at dispatch time. Verified: 1394 koda-core lib tests pass (−3: env-var test never created + 1 description test deleted + 1 description test deleted; −1 from the deleted bg_agent_iter_counter test that lived in the integration suite, not lib), 656 koda-cli lib tests pass, all `koda-core` integration suites pass, `cargo clippy --workspace --lib --tests --all-features` clean, `cargo fmt --check` clean.

- **Background-task management trio retired — `WaitTask` / `ListBackgroundTasks` / `CancelTask` removed from the LLM tool surface** (#1325 Phase 5b). The three tools added in #996 Layer 2 are superseded by `WaitForMail` plus the mailbox bridge that landed in #1336: `notify_parent_mailbox` fires from `run_bg_agent` the moment any bg-agent exits, so the parent's mailbox watch-sequence increments before the existing drain-injection path adds the `Role::Tool` row — the model now needs exactly one tool to block on background work, and that tool unifies cleanly with the peer-messaging surface from #1325 Phase 3 (`SendMessage` / `WaitForMail`). Listing in-flight tasks and explicit per-task cancellation never had Codex/Claude-Code/Gemini-CLI equivalents and were a koda-specific surface from the pre-mailbox era — dropping them removes ~1,500 LOC (the 1,397-line `bg_task_tools.rs` minus the ~95 lines of `TaskId` / `parse_task_id` that moved to a new `tools/task_id.rs` module so the TUI's `/cancel <id>` slash command keeps working). The early-dispatch branch in `tool_dispatch::execute_one_tool` that routed the trio around `ToolRegistry::execute()` (it needed `Arc<ChildAgentRegistry>` plus the caller's spawner identity) is also gone, simplifying the dispatch path. `META_TOOLS` in `skill_scope` and `CANONICAL` in `tool_normalize` lost the trio; in their place are the post-#1325 meta tools (`SpawnAgent` / `SendMessage` / `WaitForMail`). The five snake_case aliases (`list_background_tasks`, `list_bg_tasks`, `cancel_task`, `wait_task`, `wait_for_task`) are removed too, so a model emitting any of those names now hits the unknown-tool fallback — the correct signal that the tool is gone. The `InvokeAgent` tool description and module-level rustdoc are rewritten to point at `WaitForMail` instead of `WaitTask` (a regression test inverts the assertion: the description must reference `WaitForMail` AND must NOT reference `WaitTask`, so a future copy-paste can't silently re-introduce the dead pointer). Test migration: `e2e_agent_test::test_sub_agent_cache_hit_skips_llm` switched its barrier between two `InvokeAgent` calls from `WaitTask({"task_ids":["agent:1"]})` to `WaitForMail({})` — the bg-completion mail from #1336's bridge fires AFTER `sub_agent_cache.put`, so `WaitForMail` gives the same happens-before edge that pinned the cache hit before. Test counts: `BUILTIN_TOOLS` 25→22, `tool_wiring_test::HIGHER_LAYER_DISPATCH` is now empty (the bypass list existed exclusively for the bg-task trio). The TUI rendering for historical `WaitTask` / `ListBackgroundTasks` rows in `wait_task_format.rs` (534 lines), `tui_render.rs`, and `history_render.rs` stays put — pre-5b session DBs persisted on disk still contain rows with these tool names and resuming an old session must render them correctly. The underlying registries (`ChildAgentRegistry`, `BgRegistry`, `ChildTaskSnapshot`) are untouched: they still drive the TUI `child_activity_overlay`, the `/agents` and `/cancel` slash commands, and the `notify_parent_mailbox` walk on bg-agent exit — Phase 5b is a tool-surface retirement, not a runtime simplification. DESIGN.md updated: the layer-history of #996 marks Layers 2 and G retired with forward-pointers to the mailbox surface; the "considered and rejected: Codex collab v2" entry rewritten to "considered and selectively adopted" reflecting that #1325 Phases 1–5 actually shipped the mailbox half (Phases 1–2 vendored Codex's `mailbox.rs` substrate, Phase 3 added the peer tools, Phase 4 added per-agent paths and a registry, Phase 5a added `SpawnAgent` plus the completion bridge, Phase 5b is this PR). What's still rejected: `close_agent` / `resume_agent` and persistent ThreadId-addressed agents — koda agents stay session-scoped, no cross-session resume. Net adopted footprint is ~1,500 LOC, half the original 3,000-LOC estimate, because the mailbox machinery is shared between peer messaging and bg-completion delivery. Verified: 1387 `koda-core` lib tests pass (−25 from the deleted `bg_task_tools.rs` test mod), 634 `koda-cli` lib tests pass, all `koda-core` integration suites pass, `cargo clippy --workspace --lib --tests --all-features -- -D warnings` clean.

- **`InvokeAgent` is now spawn-only — the `background:bool` parameter is gone** (#1163, Lean A). Pre-#1163 the model picked between foreground (blocking, returns the sub-agent's final output as the tool result) and background (returns a task_id immediately, result auto-injects on a future iteration) by setting a required `background` boolean on every `InvokeAgent` call. The schema marked it required and the runtime parser rejected missing/wrong-type values, but model compliance with `required` is best-effort and the failure mode of "models silently default to `background:false` and serialize parallel fan-out into blocking calls" was the actual issue that opened #1232: a session showed 10/10 sub-agent calls blocking sequentially even when the model's plan had explicit parallel intent. Lean A collapses dispatch to a single shape that matches Codex's `spawn_agent` and Claude Code's `TaskCreate` — every `InvokeAgent` call spawns a `tokio::spawn`-ed sub-agent and returns its `task_id` synchronously; results auto-drain as user messages on a future parent iteration; `WaitTask([task_id, ...])` is reserved for the rare case where the parent has run out of useful concurrent work AND the next step strictly depends on the sub-agent output. Two semantic fixes fell out of the refactor for free: `SubAgentStart` events now fire on the parent's sink at dispatch time (pre-#1163 the bg path emitted only an `Info { "\u{1f680} ... launched in background" }` line and ACP / headless / e2e clients watching for `SubAgentStart` saw nothing on bg dispatches), and the sub-agent result cache check moved above the spawn block so cache hits short-circuit without spawning a `tokio` task or eating a registry slot. Internals: `execute_sub_agent` gained a private `inline_only:bool` recursion-guard parameter (used by `run_bg_agent` to drive the inference loop inside its own spawned task without re-entering the spawn path); the `parse_background_required` validator and its 7-test mod block were deleted. Test rework: 5 e2e tests in `koda-core/tests/e2e_agent_test.rs` migrated from fg-path-only event surfaces (raw `Info`, `ToolCallResult.output` containing sub-agent text, raw `ToolCallStart{is_sub_agent:true}`) to the post-#1163 surfaces (`AgentStatus::Completed.summary` for sub-agent output, `ChildAgentActivity::ToolStart` for forwarded sub-agent tool counts, `ChildAgentActivity::Info` for forwarded info lines). The `invoke_agent_and_take_calls` test helper now polls for terminal bg status before reading recorded provider calls (pre-#1163 the inline path completed inside `run_inference`, so reads were race-free "by accident"). DESIGN.md, the `child_agent.rs` / `session.rs` / `bg_task_tools.rs` module docs, the `ListBackgroundTasks` model-facing tool description, and the `/agents` TUI command doc all updated for the spawn-only mental model. Total: 1332 lib tests + 12 e2e tests + 634 koda-cli tests green; `cargo clippy --workspace -- -D warnings` clean.
Expand Down
49 changes: 33 additions & 16 deletions koda-core/src/tool_dispatch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -306,23 +306,37 @@ pub(crate) async fn execute_one_tool(
// Phase 5 PR-4 of #934: hand the parent's effective policy
// to the child so `compose()` can stack restrictions.
&policy,
// Layer 4 of #996 + #1076: foreground sub-agents are not
// tracked in the bg-agent registry, so there is no
// `ChildStatusEmitter` to fan out per-iteration heartbeats
// to. Pass `None` to skip the per-iteration emit.
// Layer 4 of #996 + #1076: sync sub-agents are not tracked
// in the bg-agent registry. Pass `None` to skip the per-
// iteration emit — the FG path at sub_agent_dispatch.rs
// documents this as deliberate. Live activity flows
// through `ChildAgentActivity` events on the parent's
// `EngineSink` instead.
None,
// #1108 P2a: pass the InvokeAgent tool_call_id so any
// bg-sub-agent reservation can record it. The drain
// hook in the inference loop will persist the bg sub-
// agent's narrative trace to `session_events` keyed by
// this id, so the transcript renderer can fold it under
// the parent's `InvokeAgent` tool result.
// #1108 P2a: pass the InvokeAgent tool_call_id so the
// narrative trace rows are folded under the parent's
// `InvokeAgent` tool result in the persisted transcript.
Some(&tc.id),
// **#1163 (Lean A)**: `inline_only=false` is the public
// dispatch path — every InvokeAgent call spawns a bg
// task and returns its task_id immediately. The `true`
// value is reserved for `run_bg_agent`'s recursion guard.
false,
// **#1366 phase 1**: sub-agents are now synchronous
// delegations. `inline_only=true` selects the FG path that
// drives the sub-agent's inference loop inline and returns
// its final output as the tool result — no more bg-spawn-
// and-poll round-trip via task_id + WaitForMail.
//
// Architectural rationale: the LLM API caps throughput,
// not concurrency, so async dispatch buys little while
// costing a lot of surface area (mailbox / WaitForMail /
// pill / overlay). Sub-agents' real value is context
// isolation + tool restriction + specialization, all of
// which work fine with sync delegation.
//
// Pre-#1366 this was `false` (bg-spawn) and `true` was
// reserved for `run_bg_agent`'s recursion guard. With
// bg-spawn dead from the public surface, `true` IS the
// public path; phase 2 deletes the now-unreachable bg
// branch + mailbox + activity-pill UI in one large
// deletion PR.
true,
);
match Box::pin(fut).await {
Ok(output) => (output, true, None),
Expand Down Expand Up @@ -389,7 +403,10 @@ pub(crate) async fn execute_one_tool(
&policy,
None,
Some(&remapped_tc.id),
false,
// **#1366 phase 1**: SpawnAgent is the codex-v2 surface
// alias for InvokeAgent; both paths share the sync swap.
// See InvokeAgent branch above for the full rationale.
true,
);
match Box::pin(fut).await {
Ok(output) => (output, true, None),
Expand Down
Loading
Loading