Skip to content

fix(#1369): scan back through assistant history when sub-agent's final response is empty#1369

Merged
lijunzh merged 1 commit intomainfrom
fix/1369-empty-subagent-scan-back
May 10, 2026
Merged

fix(#1369): scan back through assistant history when sub-agent's final response is empty#1369
lijunzh merged 1 commit intomainfrom
fix/1369-empty-subagent-scan-back

Conversation

@lijunzh
Copy link
Copy Markdown
Owner

@lijunzh lijunzh commented May 10, 2026

Fix #1369 — sub-agent empty final response no longer surfaces "(no output)" to the parent

Discovered during: #1366 phase-1 validation (sync sub-agent dispatch)
Pre-existing bug: latent since the (no output) fallback was added; #1366 made it user-visible
Independent: branched off main, mergeable on its own
Companion: #1370 (epic) for the longer-term gemini-cli complete_task pattern

What was happening

Real-world repro from the #1366 phase-1 validation: user typed "explore the codebase", code-puppy invoked the explore sub-agent three times in a row — each returning (no output) — before the parent's own model bailed empty. Then the user got the generic "Model produced an empty response after tool use" warning.

● InvokeAgent explore
  🤖 Sub-agent: explore
  ● List . ● Read Cargo.toml ● Read README.md ...
  │ (no output)         ← bug

● InvokeAgent explore
  🤖 Sub-agent: explore
  ● List .
  │ (no output)         ← bug

● InvokeAgent explore
  🤖 Sub-agent: explore
  ● List .
  │ (no output)         ← bug

⚠ Model produced an empty response after tool use.

Root cause

koda-core/src/sub_agent_dispatch.rs:1727-1729 (pre-fix):

let result = response
    .content
    .unwrap_or_else(|| "(no output)".to_string());

When the final LLM response in a sub-agent loop has tool_calls.is_empty() && content.is_none() (or Some("")), the dispatcher returned the literal sentinel "(no output)". Pre-#1366 (bg-spawn era) this landed in the auto-drained completion mail and was largely invisible. Post-#1366 phase-1 (sync dispatch), the sentinel surfaces directly as ToolCallResult.output on the parent's InvokeAgent call — and the parent treats it as the agent's actual answer.

Not a Gemini-specific bug

Every provider's response normalization reaches LlmResponse { content: None, tool_calls: vec![] } for any model that emits a bare stop signal:

Provider File:line Empty-content normalization
Anthropic anthropic.rs:507-511 if content_text.is_empty() { None } else { ... }
OpenAI-compat openai_compat.rs:425 choice.message.content.filter(|c| !c.is_empty())
Gemini gemini.rs:772-776 if content_text.is_empty() { None } else { ... }

Gemini hits it more often after long tool chains; Claude and GPT can produce it too. The parent inference loop already had an empty-response Warn handler at inference.rs:1184-1192 — the asymmetry with the sub-agent dispatch path IS the bug.

Fix — adopt claude_code_src's scan-back pattern

peers/claude_code_src/src/tools/AgentTool/agentToolUtils.ts::finalizeAgentTool (~line 297):

"If the final assistant message is a pure tool_use block (loop exited mid-turn), fall back to the most recent assistant message that has text content."

Translated to Rust as a free helper:

fn recover_last_assistant_text(messages: &[ChatMessage]) -> Option<String> {
    messages
        .iter()
        .rev()
        .filter(|m| m.role == "assistant")
        .find_map(|m| {
            m.content
                .as_deref()
                .map(str::trim)
                .filter(|t| !t.is_empty())
                .map(|t| t.to_string())
        })
}

Dispatcher's empty-response branch becomes:

let result = match response.content.as_deref() {
    Some(text) if !text.trim().is_empty() => text.to_string(),
    _ => recover_last_assistant_text(&messages).unwrap_or_else(|| {
        format!(
            "[sub-agent '{agent_name}' finished after {iter} turn(s) without \
             producing any text response. The model may have hit a \
             provider-specific stop condition (e.g. Gemini's bare-stop after \
             long tool chains). Try rephrasing the prompt, simplifying the \
             task, or switching models.]"
        )
    }),
};

If the sub-agent ever said anything useful in the run, the parent sees it. Falls through to a structured marker (naming the agent, turn count, and likely cause) only when no recoverable text exists anywhere.

Considered alternatives

Option Status Why
Scan-back (this PR) ✅ adopted Zero LLM cost. Robust to mid-turn loop exits. Mirrors a known good pattern.
gemini-cli's required complete_task tool + grace turn (packages/core/src/tools/complete-task.ts + agents/local-executor.ts:361-381) ⏸ deferred → #1370 Stricter and more elegant but requires updating every built-in sub-agent prompt to mandate the call. Real epic on its own.
Synthetic re-prompt grace turn ("Please summarize your findings.") ❌ rejected Costs +1 LLM call per occurrence. Only helps when model has more to say. Scan-back is strictly cheaper.

Tests

Suite Tests Coverage
sub_agent_dispatch::recover_last_assistant_text_tests +7 Helper contract: happy path, content-less skip, whitespace-only filtering, trim guarantee, user/system role filtering, empty-input edge, no-text-anywhere → None
e2e_agent_test::sub_agent_empty_final_response_falls_back_to_recovered_text +1 Dispatcher integration end-to-end. Includes regression guard: assert!(!invoke_output.contains("(no output)")) — if a future refactor re-introduces the bare sentinel, this catches it before merge.

Verification

Check Result
cargo test -p koda-core --lib ✅ 1401 passed (+7), 1 ignored
cargo test -p koda-core --tests (all integration) ✅ all green (+1 new)
cargo clippy --workspace --lib --tests --all-features ✅ clean
cargo fmt --check ✅ clean
pre-push hook ✅ passed

Diff stats

 CHANGELOG.md                        |   4 +
 koda-core/src/sub_agent_dispatch.rs | 243 ++++++++++++++++++++++++++++++++++++-
 koda-core/tests/e2e_agent_test.rs   | 125 +++++++++++++++++++
 3 files changed, 369 insertions(+), 3 deletions(-)

(Most of the diff is comments + tests pinning the contract — actual logic change is ~10 lines.)

Risk

Low. The change is additive on the recovery side and replaces a bare unwrap_or_else sentinel with structured behavior. The regression guard in the e2e test ensures the literal "(no output)" string can never reach the parent again. The 7-test helper suite pins every documented edge of the scan-back contract.

Refs: #1369 #1366 #1370

…l response is empty

Pre-fix the sub-agent dispatch loop returned the literal sentinel
"(no output)" whenever the final LLM response had no tool calls
AND no text content. Pre-#1366 (bg-spawn era) the sentinel landed
in the auto-drained completion mail and was largely invisible;
post-#1366 phase 1 (sync dispatch) it surfaces directly as the
parent's InvokeAgent ToolCallResult, and the parent (a) treats it
as the agent's actual answer, (b) re-invokes hoping for a real
one, or (c) hits the same edge in its own loop.

Real-world reproduction (#1366 phase-1 validation): user asked
"explore the codebase", code-puppy invoked the explore agent
three times in a row \u2014 each returning '(no output)' \u2014 before
the parent's own model bailed empty.

Not a Gemini-specific bug. Every provider's response normalization
(providers/{anthropic,gemini,openai_compat}.rs) reaches
LlmResponse { content: None, tool_calls: vec![] } for any model
that emits a bare stop signal. Gemini hits it more often after
long tool chains; Claude and GPT can produce it too. The parent
inference loop already had an empty-response Warn handler at
inference.rs:1184-1192 \u2014 the asymmetry with the sub-agent
dispatch path IS the bug.

Fix mirrors claude_code_src/src/tools/AgentTool/agentToolUtils.ts
::finalizeAgentTool (~line 297, vendored at peers/): when the
final assistant turn is empty, walk backward through the sub-
agent's assistant history for the most recent non-empty trimmed
text and surface THAT as the answer. If the agent ever said
something useful in the run, the parent sees it. Falls through
to a structured marker (naming the agent, turn count, and the
likely cause) only when no recoverable text exists \u2014 which
means something is genuinely wrong, and the marker is actionable
rather than the bare sentinel.

Implementation
- New free fn recover_last_assistant_text(messages: &[ChatMessage])
  -> Option<String> next to workspace_provision_failure_marker.
  Filters out empty/whitespace-only content, trims output, returns
  None (not Some("")) when nothing recoverable exists.
- Dispatcher's empty-response branch now matches:
  match response.content.as_deref() {
    Some(text) if !text.trim().is_empty() => text.to_string(),
    _ => recover_last_assistant_text(&messages).unwrap_or_else(...
         structured marker ...),
  }

Considered alternatives (deferred or rejected)
- gemini-cli's required complete_task tool + grace turn pattern
  (packages/core/src/tools/complete-task.ts +
  agents/local-executor.ts:361-381). Stricter and more elegant
  but requires updating every built-in sub-agent prompt to
  mandate the call. Deferred to #1370.
- Synthetic re-prompt grace turn ("Please summarize your findings").
  Costs +1 LLM call per occurrence; only helps when model has
  more to say. Scan-back is strictly cheaper for the common case.

Tests
- 7 new unit tests in
  sub_agent_dispatch::recover_last_assistant_text_tests pin the
  helper's contract directly (happy path, content-less skip,
  whitespace-only filtering, trim guarantee, user/system filtering,
  empty-input edge, no-text-anywhere None return).
- 1 new e2e in
  e2e_agent_test.rs::sub_agent_empty_final_response_falls_back_to_recovered_text
  pins the dispatcher integration end-to-end including the
  regression guard that the literal "(no output)" string MUST
  NOT reach the parent.

Verified
- 1401 koda-core lib tests pass (+7 new)
- All koda-core integration suites pass (+1 new)
- cargo clippy --workspace --lib --tests --all-features clean
- cargo fmt --check clean

Refs: #1369 #1366 #1370
@lijunzh lijunzh merged commit 9bb1c58 into main May 10, 2026
16 checks passed
@lijunzh lijunzh deleted the fix/1369-empty-subagent-scan-back branch May 10, 2026 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant