fix(core): recover from truncated tool calls via multi-turn continuation#3313
fix(core): recover from truncated tool calls via multi-turn continuation#3313
Conversation
…ion (#3049) When large tool calls (e.g., WriteFile with big HTML) exceed the output token limit, the model's response gets truncated and required parameters like file_path are missing. Previously this surfaced as a confusing "params must have required property" error. Three-layer defense: 1. Escalate to model's actual output limit (not fixed 64K). Models with 128K output (Claude Opus, GPT-5) now use their full capacity. 2. Multi-turn recovery: if the escalated response is still truncated, keep the partial response in history and inject a recovery message ("Resume directly — pick up mid-thought") so the model continues from where it left off. Up to 3 recovery attempts before falling back to the tool scheduler's guidance. 3. Stronger truncation guidance as fallback: "you MUST split" instead of "consider splitting". Also fixes: - Clear toolCallRequests on RETRY to prevent duplicate tool execution - Add isContinuation flag to RETRY events so the UI preserves text buffers during recovery (continuation) but resets them during escalation (fresh restart) - Catch errors during recovery to prevent dangling history entries
📋 Review SummaryThis PR implements a robust three-layer defense against output token truncation when model responses exceed token limits, addressing issue #3049. The implementation is well-structured, follows existing patterns, and includes thoughtful error handling. The changes are focused and demonstrate good understanding of the streaming architecture. 🔍 General Feedback
🎯 Specific Feedback🟡 High
// Before the recovery attempt, add:
if (recoveryCount > 0) {
const delayMs = Math.min(1000 * recoveryCount, 3000); // Cap at 3s
await new Promise(resolve => setTimeout(resolve, delayMs));
}
// Add guard before pushing recovery message:
if (
self.history.length === 0 ||
self.history[self.history.length - 1].role !== 'user'
) {
self.history.push(createUserContent([{ text: OUTPUT_RECOVERY_MESSAGE }]));
}🟢 Medium
let escalatedFinishReason: string | undefined = undefined;
// Current code is fine since toolCallRequests is declared in scope,
// but adding a defensive check wouldn't hurt:
if (toolCallRequests && toolCallRequests.length > 0) {
toolCallRequests.length = 0;
}🔵 Low
✅ Highlights
|
…hanism Update the design doc to reflect: - Escalation now targets model's actual output limit (64K floor) - Multi-turn recovery loop after escalation (up to 3 attempts) - isContinuation flag on RETRY events - Recovery error handling (pop dangling message, break) - Updated constants table and model-specific escalation limits - New design decision: why multi-turn recovery over progressive escalation
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] The ACP cron path starts a background rewrite with this.messageRewriter.flushTurn(ac.signal), but unlike the normal prompt path it never waits for pending rewrites before returning. That can drop the final rewritten ACP progress update if the cron turn finishes immediately after scheduling the rewrite. Consider mirroring the normal prompt flow by awaiting this.messageRewriter.waitForPendingRewrites() before #executeCronPrompt() returns.
— gpt-5.4 via Qwen Code /review
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Adds a well-structured 3-layer recovery for truncated model output. The recovery loop is cleanly bounded by MAX_OUTPUT_RECOVERY_ATTEMPTS, and the two bug fixes (duplicate tool execution on RETRY, text duplication on escalation) are real and correct. However, the continuation path has two correctness bugs.
Issues
-
Partial text lost from display during continuation. On a continuation RETRY,
setPendingHistoryItem(null)clears the partial text. WhenhandleContentEventruns for the first continuation chunk, it sees a null pending item, creates a new one, and resetsgeminiMessageBuffer = eventValue— discarding the preserved buffer. The user loses the beginning of the recovered response. Suggestion: commit the partial text to the static item list viaaddItembefore clearing, or don't clearpendingHistoryItemRef.currentand let the continuation append to it. -
Recovery on truncated tool-call turns corrupts conversation state. If the truncated turn already emitted a complete
functionCall, the recovery injects a plain user message — creatingmodel(functionCall) → user(text)with nofunctionResponse. This is an invalid sequence for the API. Suggestion: skip multi-turn recovery when the truncated turn containsfunctionCallparts, and let the existing tool scheduler fallback (layer 3) handle those cases. -
Recovery errors swallowed after partial chunks emitted. A recovery attempt can yield chunks then fail. The catch block breaks without emitting an error event, leaving the UI with partial text and no failure indication. Suggestion: emit an error or warning event in the catch block.
Verdict
REQUEST_CHANGES — The text loss and conversation state corruption need to be fixed. Test coverage for the new recovery loop is also expected.
|
Thanks for the thorough review! Re High #1 (backoff between recovery attempts): Re High #2 (guard before pushing recovery message): This is already handled by the catch block. If a recovery attempt fails, the catch pops the dangling user message and breaks out of the loop. In the success path, Re Medium #3-5: Acknowledged as style preferences. Re Low #6-9: Good suggestions for future iterations. Keeping them out of this PR scope to minimize change surface. |
|
Hi @wenshao, looks like your earlier response was addressing the github-actions bot review. Could you take a look at the three critical issues I flagged in my review above?
Happy to discuss if you disagree on any of these — (2) in particular would be good to confirm against real API behavior. |
Three correctness fixes from @tanzhenxin's review: 1. Partial text lost during continuation (useGeminiStream.ts): On continuation RETRY, setPendingHistoryItem(null) cleared the pending gemini item. The next Content event then saw a null pending item, created a fresh one, and reset geminiMessageBuffer = eventValue — discarding the preserved partial text. Now the pending item AND buffers are kept on continuation, so the continuation appends. 2. Recovery on truncated tool-call turns (geminiChat.ts): When the truncated turn already contains a complete functionCall, appending a user recovery message produces model(functionCall) → user(text) with no intervening functionResponse — an invalid API sequence. Now recovery skips turns with functionCall parts and defers to the tool scheduler's layer-3 fallback. 3. Recovery errors swallowed after partial chunks (geminiChat.ts): If a recovery attempt yielded chunks then failed, the catch block broke without emitting any terminal signal, leaving the UI with partial text and no Finished event. Now emits a synthetic finishReason=STOP chunk in the catch so the UI gets a proper terminal signal.
|
@tanzhenxin You are absolutely right on all three — apologies for conflating your review with the bot review earlier. Addressed in f683830: 1. Partial text lost during continuation — Kept both 2. Recovery on truncated tool-call turns — Added a guard at the top of the recovery loop: if the last model entry in history contains any 3. Recovery errors swallowed after partial chunks — In the catch block, now emit a synthetic chunk with Regarding (2), I verified by tracing the history structure after the guard fires: the truncated model turn with functionCall stays intact, Will add targeted tests for the recovery loop next. |
Four targeted tests for the recovery mechanism introduced in the truncated-tool-call-recovery PR: 1. Recovery loop fires when escalated response is also truncated: initial MAX_TOKENS → escalation MAX_TOKENS → recovery STOP. Verifies two RETRY events (one escalation, one continuation) and three API calls. 2. Recovery is skipped when truncated turn contains a functionCall: prevents the invalid model(functionCall) → user(text) sequence. Verifies no continuation RETRY and history ends with the functionCall intact. 3. Recovery attempts are capped at MAX_OUTPUT_RECOVERY_ATTEMPTS (3): persistent MAX_TOKENS triggers exactly 5 API calls (1 initial + 1 escalation + 3 recovery). 4. Recovery catch block emits synthetic STOP chunk and pops dangling user message: when a recovery attempt fails (empty stream → InvalidStreamError), the UI gets a terminal signal and history ends on the model turn, not a dangling user recovery message.
|
Added 4 targeted recovery-loop tests in b9340c1 covering:
Total: 198 tests pass (194 existing + 4 new). |
Existing tests cover the functionCall guard when both initial and escalated responses have functionCall. This adds a test for the cross-iteration case: iter 1 returns text (recovery proceeds), iter 2 returns functionCall (recovery must break before iter 3). Verifies: - API called exactly 4 times (1 initial + 1 escalation + 2 recovery) - History ends with the functionCall model turn, not a dangling user recovery message - Iter 3's user recovery message is never pushed (guard fires at top of loop before recoveryCount increment)
The object literal {candidates, content, parts} doesn't structurally
overlap enough with GenerateContentResponse for TypeScript's strict
narrow cast. Casting through 'unknown' is required per TS2352.
Build error from CI:
src/core/geminiChat.ts(651,24): error TS2352: Conversion of type '...'
to type 'GenerateContentResponse' may be a mistake because neither
type sufficiently overlaps with the other. If this was intentional,
convert the expression to 'unknown' first.
Summary
Fixes #3049 — WriteFile/ReadFile tool calls fail with
params must have required property 'file_path'when model output is truncated due to max_tokens limit.Root cause
When the model generates a large tool call (e.g., WriteFile with a big HTML file), the response can exceed the output token limit. The adaptive escalation (#2898) retries at 64K, but:
file_pathmissing) and the user sees a confusing validation errorRecovery flow / 恢复流程
Changes / 改动详情
1. Escalate to the model's actual output limit / 提升到模型实际输出上限
Instead of a fixed 64K escalation target, use
Math.max(ESCALATED_MAX_TOKENS, tokenLimit(model, 'output')).不再固定提升到 64K,而是取 64K 与模型实际输出上限的较大值:
2. Multi-turn recovery after escalation / 提升后多轮续写恢复
When the escalated response is still truncated (finishReason === MAX_TOKENS), instead of immediately failing with a tool validation error:
当提升后响应仍然被截断时,不再直接报错,而是:
The recovery RETRY uses
isContinuation: trueso the UI preserves the text buffer (appending continuation text) rather than resetting it (as done for escalation).恢复 RETRY 使用
isContinuation: true,UI 保留文本缓冲区(追加续写内容),而非像提升重试那样重置。If a recovery attempt fails (e.g., empty response, network error), the catch block pops the dangling recovery message to keep history valid, then exits the loop gracefully.
如果恢复尝试失败(空响应、网络错误等),catch 会弹出悬空的恢复消息以保持历史有效,然后优雅退出循环。
3. Stronger truncation guidance (fallback) / 加强截断引导(兜底)
If recovery is exhausted, the tool scheduler's existing truncation rejection now uses more directive language:
恢复耗尽后,工具调度器的截断拒绝消息改用更明确的指令:
Bug fixes / 附带修复
toolCallRequestsinuseGeminiStream.tswas never cleared on Retry events, causing tool calls from a truncated attempt to persist alongside the retried attempt's calls. Now cleared on every RETRY. /toolCallRequests在 Retry 事件时从未清空,导致截断尝试的工具调用与重试的调用并存。现已在每次 RETRY 时清空。geminiMessageBufferwas not reset on escalation RETRY, causing the 8K partial text to be concatenated with the full escalated response. Now reset whenisContinuationis false. / 提升 RETRY 时geminiMessageBuffer未重置,导致 8K 部分文本与完整提升响应拼接。现在当isContinuation为 false 时重置。Test plan / 测试计划