lijunzh · lijunzh · May 10, 2026 · May 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
 
 ## [Unreleased]
 
+### Fixed
+
+- **Sub-agent empty final response no longer surfaces `"(no output)"` to the parent** (#1369). When a sub-agent's final LLM response in `execute_sub_agent` came back with `tool_calls.is_empty() && content.is_none()` (or `Some("")`), the dispatcher used to return the literal sentinel `"(no output)"` as the sub-agent's answer. Pre-#1366 (bg-spawn era) this was largely invisible because the sentinel landed in the auto-drained completion mail; post-#1366 phase 1 (sync dispatch) the sentinel surfaces directly as `ToolCallResult.output` on the parent's `InvokeAgent` call, and the parent (a) treats it as the agent's actual answer, (b) re-invokes the same sub-agent hoping for a real one, or (c) hits the same edge in its own loop and emits the generic "Model produced an empty response after tool use" warning. The user-visible symptom from the #1366 phase-1 validation was the parent calling `InvokeAgent explore` three times in a row with each returning `(no output)` before the parent itself bailed empty. NOT a Gemini-specific bug — every provider's response normalization (`providers/{anthropic,gemini,openai_compat}.rs`) reaches `LlmResponse { content: None, tool_calls: vec![] }` for any model that emits a bare stop signal with no text payload (Gemini hits it more often after long tool chains; Claude and GPT can produce it too); the parent inference loop already had an empty-response Warn handler at `inference.rs:1184-1192` but the sub-agent dispatch path didn't, which is the asymmetry that became the bug. Fix mirrors `claude_code_src/src/tools/AgentTool/agentToolUtils.ts::finalizeAgentTool` (~line 297, vendored at `peers/claude_code_src/`): if the final assistant turn is empty, walk backward through the sub-agent's assistant message history for the most recent non-empty trimmed text and surface THAT as the answer (the model said something useful at some point in the run; let the parent see it). Falls through to a structured marker (`"[sub-agent '{name}' finished after {iter} turn(s) without producing any text response. The model may have hit a provider-specific stop condition (e.g. Gemini's bare-stop after long tool chains). Try rephrasing the prompt, simplifying the task, or switching models.]"`) only when no recoverable text exists anywhere in the run — which means something is genuinely wrong (e.g. every assistant turn was a pure tool-call), and the marker is actionable rather than the bare sentinel. The recovery scan is implemented as a free function `recover_last_assistant_text(messages: &[ChatMessage]) -> Option<String>` next to `workspace_provision_failure_marker` so the contract is testable without invoking the 1300-line dispatcher loop. Considered alternatives: (a) gemini-cli's required `complete_task` tool + grace turn pattern (`packages/core/src/tools/complete-task.ts` + `agents/local-executor.ts:361-381`) — stricter and more elegant but requires updating every built-in sub-agent prompt to mandate the call, plus per-agent designer effort, deferred to #1370; (b) a synthetic re-prompt grace turn ("Please summarize your findings.") — costs one extra LLM call per occurrence and only helps when the model has more to say, scan-back is strictly cheaper and more robust for the common case. Tests: 7 new unit tests in `sub_agent_dispatch::recover_last_assistant_text_tests` pin the helper's contract directly (happy path, content-less skip, whitespace-only filtering, trim guarantee, user/system filtering, empty-input edge, no-text-anywhere None return); 1 new e2e in `e2e_agent_test.rs::sub_agent_empty_final_response_falls_back_to_recovered_text` pins the dispatcher integration end-to-end including the regression guard that the literal `"(no output)"` string MUST NOT reach the parent. Verified: 1401 koda-core lib tests pass (+7 new), all integration suites pass (+1 new), `cargo clippy --workspace --lib --tests --all-features` clean, `cargo fmt --check` clean.
+
 ### Changed (BREAKING for sub-agent dispatch shape)
 
 - **`InvokeAgent` / `SpawnAgent` are now synchronous tool delegations — bg-spawn dispatch path retired** (#1366 phase 1). Dispatch now drives the sub-agent's inference loop inline on the parent's task and returns the sub-agent's final answer as the `ToolCallResult.output`, the same shape as every other tool. Pre-#1366 (i.e., post-#1163) every InvokeAgent call was `tokio::spawn`'d into a `run_bg_agent` task that returned a `task_id` immediately; results reached the model via auto-drain on a future iteration or via an explicit `WaitForMail` call against the mailbox bridge from #1336. That whole async surface is gone from the public dispatch path: the `inline_only=true` recursion-guard branch in `execute_sub_agent` is now the only public path; the `inline_only=false` bg-spawn block (and `run_bg_agent`, `BgRegistry`, `ChildAgentActivity` event family, mailbox infrastructure, activity-pill UI, child-activity overlay, `WaitForMail` tool) becomes dead from the public surface and will be deleted across phases 2–5 of #1366. Architectural rationale (DESIGN.md "Execution Model" section, post-#1366): the LLM API caps throughput, not concurrency, so async dispatch buys little while costing a lot of surface area; sub-agents' real value is context isolation + tool restriction + specialization, all of which work fine with sync delegation. Codex's ExecCell and gemini-CLI's `SubagentGroupDisplay` patterns inspired the swap. Deliberately NO runtime config knob — per DESIGN.md P1 ("compile decisions in, don't configure them at runtime"), the sync swap is the only behavior; a previous draft that gated it behind `KODA_SUBAGENT_SYNC` was rejected as a principle violation. The `tool_dispatch.rs` change is two lines: `inline_only=false` → `true` at the InvokeAgent and SpawnAgent callsites. The InvokeAgent tool description and module-level rustdoc are rewritten to reflect sync semantics: drops the "Spawn a sub-agent in the background" / "Returns IMMEDIATELY with a task_id" framing in favor of "Delegate a task and get its final answer back as the tool result. The call BLOCKS until the sub-agent finishes." The `[ERROR: sub-agent` marker convention is preserved AND extended to recognize the dispatcher's `Err` arm format (`"Error invoking sub-agent: {chain}"`) as the second structural-failure shape. Schema unchanged — `agent_name` + `prompt` required, no `background` param (the regression guard from #1163 still passes). Tests: 6 e2e cases in `e2e_agent_test.rs` migrated from `collect_bg_events_after(...).find_map(AgentStatus::Completed { summary }/...)` to direct `events.iter().find_map(EngineEvent::ToolCallResult { name == "InvokeAgent", output })` — the pre-#1163 wait-free assertion shape returns, now without the `tokio::spawn` race window. Two e2e cases in `e2e_sub_agent_trust_test.rs` lose their `#1321/#1323/#1327` deflake waits because the race they guarded against (parent-loop termination vs bg-agent completion) does not exist on the sync path. The `bg_agent_iter_counter_advances_via_status_channel` test is deleted entirely — it pinned the `Running { iter: n }` heartbeat from the bg path, which is now unreachable from the public dispatcher; phase 2 of #1366 will delete the underlying code, at which point the test would have been dead anyway. The 4 description-pinning tests in `tools/agent.rs` are rewritten: `test_invoke_agent_description_documents_spawn_only_model` becomes `test_invoke_agent_description_documents_sync_dispatch_model` and asserts on `synchronously` / `BLOCKS` / `final answer` while explicitly forbidding `task_id` / `WaitForMail` / `auto-drain` / `in the background` (a regression that re-introduces them would mislead the model about dispatch shape); `test_invoke_agent_top_level_documents_drain_semantics` and `test_invoke_agent_description_discourages_immediate_wait` are deleted (drain semantics and the WaitForMail anti-pattern only existed because of bg-spawn). Wire format unchanged — ACP / headless clients are unaffected; sub-agent `EngineEvent::SubAgentStart` still fires at dispatch time. Verified: 1394 koda-core lib tests pass (−3: env-var test never created + 1 description test deleted + 1 description test deleted; −1 from the deleted bg_agent_iter_counter test that lived in the integration suite, not lib), 656 koda-cli lib tests pass, all `koda-core` integration suites pass, `cargo clippy --workspace --lib --tests --all-features` clean, `cargo fmt --check` clean.

diff --git a/koda-core/src/sub_agent_dispatch.rs b/koda-core/src/sub_agent_dispatch.rs
@@ -1723,9 +1723,43 @@ pub(crate) fn execute_sub_agent<'a>(
             }
 
             if response.tool_calls.is_empty() {
-                let result = response
-                    .content
-                    .unwrap_or_else(|| "(no output)".to_string());
+                // **#1369**: pre-fix this branch did
+                //     `response.content.unwrap_or_else(|| "(no output)".to_string())`
+                // and returned. That worked when the model produced a
+                // final summary, but a sub-agent loop can also exit
+                // here with `content == None` (or `Some("")`) when the
+                // model emits a bare stop signal after a long tool
+                // chain — a documented Gemini behavior, and reachable
+                // from any provider per `providers/{anthropic,gemini,
+                // openai_compat}.rs` which all normalize empty text
+                // to `None`. The literal sentinel `"(no output)"`
+                // then surfaced as the sub-agent's "answer", and the
+                // parent (especially after #1366 phase 1's sync
+                // dispatch) treated that string as the agent's
+                // conclusion — either re-invoking in a loop or acting
+                // on garbage. Fix mirrors `claude_code_src/src/tools/
+                // AgentTool/agentToolUtils.ts::finalizeAgentTool`
+                // (~line 297, also vendored under `peers/`): if the
+                // final assistant turn is empty, walk backward through
+                // history for the most recent assistant text. Falls
+                // through to a structured marker only if the entire
+                // run produced no assistant text at all (which means
+                // something is genuinely wrong, not just a quiet last
+                // turn). #1370 will layer a stricter `complete_task`
+                // contract on top of this; the scan-back stays as the
+                // robust fallback.
+                let result = match response.content.as_deref() {
+                    Some(text) if !text.trim().is_empty() => text.to_string(),
+                    _ => recover_last_assistant_text(&messages).unwrap_or_else(|| {
+                        format!(
+                            "[sub-agent '{agent_name}' finished after {iter} turn(s) without \
+                             producing any text response. The model may have hit a \
+                             provider-specific stop condition (e.g. Gemini's bare-stop after \
+                             long tool chains). Try rephrasing the prompt, simplifying the \
+                             task, or switching models.]"
+                        )
+                    }),
+                };
                 // Cache the result for future identical calls
                 sub_agent_cache.put(agent_name, prompt, &result);
                 // Release workspace; surface branch hint if agent left changes.
@@ -2045,6 +2079,209 @@ fn workspace_provision_failure_marker(agent_name: &str, reason: &str) -> String
     )
 }
 
+/// Walk `messages` backward and return the most recent assistant
+/// message's non-empty trimmed text content, if any.
+///
+/// **#1369**: used as a fallback when the final LLM response in a
+/// sub-agent loop comes back with neither tool calls nor text
+/// content (a documented Gemini behavior, but reachable from any
+/// provider per `providers/{anthropic,gemini,openai_compat}.rs`
+/// which all normalize empty content to `None`). Pre-fix the
+/// dispatch loop returned the literal sentinel `"(no output)"`
+/// here, which the parent then mis-treated as the sub-agent's
+/// actual answer.
+///
+/// Mirrors the pattern from `claude_code_src/src/tools/AgentTool/
+/// agentToolUtils.ts::finalizeAgentTool` (~line 297, also vendored
+/// at `peers/claude_code_src/`):
+///
+/// > *"If the final assistant message is a pure tool_use block
+/// > (loop exited mid-turn), fall back to the most recent
+/// > assistant message that has text content."*
+///
+/// Filters out empty / whitespace-only content so a `Some("")`
+/// row from the DB doesn't shadow earlier real text. Stops at
+/// the first hit (most recent, which is what the model is most
+/// likely treating as its conclusion).
+///
+/// Returns `None` only if the entire run produced no assistant
+/// text — which means something is genuinely wrong (e.g. the
+/// model never said anything, only called tools). Caller surfaces
+/// a structured error in that case rather than an empty string.
+fn recover_last_assistant_text(messages: &[ChatMessage]) -> Option<String> {
+    messages
+        .iter()
+        .rev()
+        .filter(|m| m.role == "assistant")
+        .find_map(|m| {
+            m.content
+                .as_deref()
+                .map(str::trim)
+                .filter(|t| !t.is_empty())
+                .map(|t| t.to_string())
+        })
+}
+
+#[cfg(test)]
+mod recover_last_assistant_text_tests {
+    //! **#1369** regression tests for [`super::recover_last_assistant_text`].
+    //!
+    //! The helper guards the post-#1366-phase-1 sub-agent dispatch
+    //! path: when the final LLM response in a sub-agent loop comes
+    //! back with neither tool calls nor text content, the dispatcher
+    //! falls back to scanning prior assistant turns for the most
+    //! recent useful text. Pre-#1369 this branch returned the bare
+    //! sentinel `"(no output)"`, which the parent (post-sync-dispatch)
+    //! treated as the sub-agent's actual answer — producing the
+    //! parent-loops-on-junk behavior reported in the #1366 phase-1
+    //! validation. Pinning the helper directly is much cheaper and
+    //! more legible than an end-to-end dispatch test, since the
+    //! actual integration sits inside the 1300-line
+    //! `execute_sub_agent` loop and would need a mock provider that
+    //! returns `LlmResponse { content: None, tool_calls: vec![] }`.
+
+    use super::recover_last_assistant_text;
+    use crate::providers::ChatMessage;
+
+    fn assistant(text: &str) -> ChatMessage {
+        ChatMessage::text("assistant", text)
+    }
+
+    fn user(text: &str) -> ChatMessage {
+        ChatMessage::text("user", text)
+    }
+
+    fn system(text: &str) -> ChatMessage {
+        ChatMessage::text("system", text)
+    }
+
+    fn assistant_no_content() -> ChatMessage {
+        // Mirrors what the DB-load path produces for a tool-call-only
+        // assistant turn: role is "assistant" but `content` is `None`.
+        ChatMessage {
+            role: "assistant".to_string(),
+            content: None,
+            tool_calls: None,
+            tool_call_id: None,
+            images: None,
+        }
+    }
+
+    /// Happy path: most recent assistant turn has text — use it.
+    /// This shouldn't actually be hit by the dispatcher (the dispatcher
+    /// only calls the helper when the LATEST response is empty), but
+    /// pinning it keeps the helper's contract self-consistent: "return
+    /// the most recent assistant text" with no special edge for
+    /// position.
+    #[test]
+    fn returns_most_recent_assistant_text_when_present() {
+        let messages = vec![
+            system("you are an explorer agent"),
+            user("explore the repo"),
+            assistant("I'll start by listing files."),
+            assistant("I found three Rust crates."),
+        ];
+        assert_eq!(
+            recover_last_assistant_text(&messages),
+            Some("I found three Rust crates.".to_string())
+        );
+    }
+
+    /// The bug we're actually fixing: latest assistant turn is
+    /// content-less (a pure tool-call row from the DB load path),
+    /// so we walk further back and surface the prior turn's text.
+    /// Mirrors claude_code_src's framing: "If the final assistant
+    /// message is a pure tool_use block (loop exited mid-turn),
+    /// fall back to the most recent assistant message that has
+    /// text content."
+    #[test]
+    fn skips_content_less_assistant_turns_and_finds_earlier_text() {
+        let messages = vec![
+            system("you are an explorer agent"),
+            user("explore the repo"),
+            assistant("I see three crates: koda-core, koda-cli, koda-sandbox."),
+            assistant_no_content(), // pure tool-call turn
+            assistant_no_content(), // another pure tool-call turn
+        ];
+        assert_eq!(
+            recover_last_assistant_text(&messages),
+            Some("I see three crates: koda-core, koda-cli, koda-sandbox.".to_string())
+        );
+    }
+
+    /// Whitespace-only content must NOT shadow earlier real text.
+    /// Without the `.trim().is_empty()` guard, a `Some("   \n")` row
+    /// from a quirky provider would short-circuit the scan and
+    /// surface a useless string — same UX failure as the original
+    /// `(no output)` bug, just in disguise.
+    #[test]
+    fn whitespace_only_content_does_not_shadow_earlier_real_text() {
+        let messages = vec![
+            user("explore"),
+            assistant("Found Cargo.toml and src/lib.rs."),
+            assistant("   \n  \t  "), // whitespace-only — must be skipped
+        ];
+        assert_eq!(
+            recover_last_assistant_text(&messages),
+            Some("Found Cargo.toml and src/lib.rs.".to_string())
+        );
+    }
+
+    /// Returned text must come back trimmed. The dispatcher uses the
+    /// return value verbatim as the sub-agent's tool-result string;
+    /// leading/trailing whitespace would render oddly in the parent's
+    /// transcript and burn tokens for nothing.
+    #[test]
+    fn returned_text_is_trimmed() {
+        let messages = vec![
+            user(""),
+            assistant("   final answer with surrounding whitespace.   "),
+        ];
+        assert_eq!(
+            recover_last_assistant_text(&messages),
+            Some("final answer with surrounding whitespace.".to_string())
+        );
+    }
+
+    /// User and system turns with text are NOT recovery candidates.
+    /// Surfacing the user's prompt back to the parent as the sub-agent's
+    /// "answer" would be actively misleading — the parent might think
+    /// the sub-agent agreed with the prompt verbatim. System turns
+    /// (including the grace-turn warning at #1135) are scaffolding,
+    /// not content. Both must be filtered.
+    #[test]
+    fn ignores_user_and_system_messages() {
+        let messages = vec![
+            system("You are explorer. You MUST summarize at the end."),
+            user("please explore the repo and tell me what you find"),
+            // No assistant text at all.
+        ];
+        assert_eq!(recover_last_assistant_text(&messages), None);
+    }
+
+    /// No assistant text anywhere → caller surfaces a structured
+    /// marker. The helper signals this case with `None`, NOT with an
+    /// empty `Some("")` (which would re-introduce the bug).
+    #[test]
+    fn returns_none_when_no_assistant_text_anywhere() {
+        let messages = vec![
+            system("explorer prompt"),
+            user("explore"),
+            assistant_no_content(),
+            assistant_no_content(),
+        ];
+        assert_eq!(recover_last_assistant_text(&messages), None);
+    }
+
+    /// Empty input slice → `None`. Pin the obvious edge so a future
+    /// refactor that swaps `iter().rev()` for some indexed access
+    /// doesn't panic in the wild.
+    #[test]
+    fn returns_none_for_empty_messages() {
+        assert_eq!(recover_last_assistant_text(&[]), None);
+    }
+}
+
 #[cfg(test)]
 mod child_agent_path_tests {
     //! **#1325 Phase 4 (commit 2)** regression tests for