fix(cli): scale compaction pruning by model budget by marius-kilocode · Pull Request #9557 · Kilo-Org/kilocode

marius-kilocode · 2026-04-27T10:43:31Z

Summary

Scale compaction pruning budgets from the active model's input/context limits instead of relying on a fixed 40k tool-output window.
Shrink overflow-triggered compaction input before the summary model call by trimming older messages and truncating large tool/synthetic text parts in-memory.
Add regression coverage for model-aware pruning, overflow compaction shrinking, plugin/MCP-style extra context, and persisted-message immutability.

Design Decisions

The immediate failure mode was that auto-compaction could be triggered only after a session was already too large, then the compaction request itself also exceeded the model limit. The previous background pruning path kept a fixed 40_000 estimated tokens of recent tool output and only ran after a turn completed, which made it too late and too model-insensitive for overflow recovery.

The old extension handled this class of problem through context management rather than tool-output pruning. In kilocode-legacy, the effective cap was allowedTokens = contextWindow * 0.9 - maxTokens: reserve model output plus a 10% context-window buffer before condensing or sliding-window truncating. On hard context-window errors it forced a more aggressive reduction path using FORCED_CONTEXT_REDUCTION_PERCENT = 75. That worked because the old extension controlled the whole conversation window before the request was sent.

This PR keeps the same design principle but adapts it to current OpenCode's architecture. Current compaction has two distinct phases: background tool-output pruning and overflow summary recovery. For models with only a context limit, the new helper mirrors the legacy shape by reserving maxOutputTokens plus a 10% prompt/context overhead buffer. For models with a separate input limit, it preserves the existing compaction.reserved behavior while still reserving a 10% overhead buffer. That overhead is intentionally conservative because the final request can include system prompts, MCP/tool schemas, AGENTS instructions, reminders, and plugin-provided compaction context that are not represented by old tool outputs alone.

Normal pruning and overflow recovery now use separate budgets. Normal pruning preserves more recent tool context for quality, scaled to the model's usable budget. Overflow recovery is deliberately stricter because the priority is making the compaction call fit at all. The overflow shrink step is in-memory only, so stored session history remains intact and can still be shown, imported, or revisited later.

Upstream Comparison

This PR covers the same failure class as upstream OpenCode issues #15849 and #17340: the original request overflows, then the recovery compaction request also overflows because it still includes too much context.

Upstream #14707 is already present in our codebase. It made 413/context-overflow errors trigger auto-compaction, strips media during compaction, and stops when compaction itself overflows. That is necessary but not sufficient for this bug because non-media context can still be too large: long tool outputs, synthetic text, repeated tool/question loops, system/reminder/plugin context, and MCP/tool schemas can exceed the model even after media stripping.

Upstream #20718 is the closest proposed fix. It pre-prunes overflow compaction input by keeping the most recent 40 messages, truncating completed tool outputs to 500 chars, and truncating synthetic text parts to 2,000 chars. This PR adopts that same placement in the pipeline: shrink the overflow compaction input before the summary model call, without mutating persisted session history.

The main difference from #20718 is that this PR makes the limits model-aware instead of fixed. The old fixed 40-message/500-char/2,000-char strategy helps, but it treats 32k, 128k, 200k, and 1M-context models the same. This PR derives the recovery budget from the active model's input/context limits, reserves output budget plus 10% prompt/context overhead, and then derives normal-prune and overflow-recovery budgets from that usable space. This is meant to preserve more useful context on large models while being more aggressive on smaller ones.

Upstream #20516 is broader hardening: circuit breaker, retries, output caps, post-compact budget accounting, and attachment restoration filtering. This PR does not attempt to solve all of that. It is a focused input-side fix for the unrecoverable overflow-compaction request. It should compose with later breaker/retry/output-cap work, but it intentionally avoids adding a larger compaction policy system in this patch.

In short: #14707 lets OpenCode attempt recovery, #20718 shows the right input-side recovery point, #20516 proposes broader loop/output hardening, and this PR adapts the input-side recovery fix to Kilo with model-aware budgets and legacy-style context headroom.

Validation

bun test test/session/compaction.test.ts
bun run typecheck
bun run script/check-opencode-annotations.ts

Local Reproduction Note

After testing against the local GPT 5.2 sessions in this worktree, the remaining FUNCTION_PAYLOAD_TOO_LARGE case was not dominated by tool output. The failing sessions contained repeated normal user text parts around 400k characters each, totaling multiple megabytes. This PR now also truncates oversized regular text parts during overflow-compaction input shrinking and truncates replayed overflow user text before auto-continuing, while preserving the original stored session messages.

markijbema · 2026-04-27T11:32:40Z

+    }),
+  }))
+}
+// kilocode_change end


why not keep this whole section in a separate file, and import it to keep the merge simpler?

markijbema

It seems quite reasonable; how did you test this? Compacting too much or too litle both can be impactful

marius-kilocode · 2026-04-27T13:33:10Z

@markijbema I don't think it reliably works yet. Waiting for @chrarnoldus 's opinion.

kilo-code-bot · 2026-04-27T13:57:37Z

-                ? { type: "text" as const, text: `[Attached ${part.mime}: ${part.filename ?? "file"}]` }
-                : part
+            // kilocode_change start - shrink replayed overflow content before auto-continuing
+            const cleaned = cap ? sanitize({ part, budget: cap }) : part


WARNING: Replay truncation uses the compaction model budget

cap is computed from model, which can resolve to the hidden compaction agent's model. But this replayed turn is re-enqueued with original.model and sent on the next real request using that model. If the compaction agent is configured with a larger context window than the user's model, this sanitization can still leave the replay too large and immediately overflow again.

kilo-code-bot · 2026-04-27T13:59:06Z

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	3
SUGGESTION	0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File	Line	Issue
`packages/opencode/src/session/compaction.ts`	39	The minimum budget floors can exceed the usable window on small-context models, so overflow recovery can still assemble a compaction request that does not fit.
`packages/opencode/src/session/compaction.ts`	200	`prune()` now depends on `provider.getModel(...)`; if that lookup fails for an old or deleted model, background pruning becomes a silent no-op.
`packages/opencode/src/session/compaction.ts`	414	Replay truncation is budgeted against the compaction model, so a larger compaction model can still replay too much context for the original user model.

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File	Line	Issue
`packages/opencode/src/session/compaction.ts`	456	The overflow fallback message still says the failure was caused by large media attachments, which is misleading for text/tool-output overflows handled by this PR.

Files Reviewed (3 files)

.changeset/model-aware-compaction.md - 0 issues
packages/opencode/src/session/compaction.ts - 3 warnings, 1 other observation
packages/opencode/test/session/compaction.test.ts - 0 issues

_{Reviewed by gpt-5.4-20260305 · 1,515,012 tokens}

kilo-code-bot · 2026-04-27T14:07:29Z

+const BUDGET_NORMAL_RATIO = 0.2
+const BUDGET_OVERFLOW_RATIO = 0.05
+const BUDGET_PROMPT_RATIO = 0.1
+const BUDGET_NORMAL_MIN = 8_000


WARNING: The minimum budgets still overshoot small-window models

budget() is meant to scale by model capacity, but these floors force normal >= 8_000 and overflow >= 2_000 even when usable is smaller than that. On 4k/8k-context models the compaction path can still keep more tool/text content than the model can fit, so overflow recovery can recurse instead of reliably making the summary request fit.

kilo-code-bot · 2026-04-27T14:07:29Z


+      // kilocode_change start - scale protected tool-output window with the active model
+      const last = msgs.findLast((msg) => msg.info.role === "user")
+      const model = last?.info.role === "user" ? yield* provider.getModel(last.info.model.providerID, last.info.model.modelID) : undefined


WARNING: Pruning now silently stops when the session model is no longer available

provider.getModel(...) throws for deleted or renamed models. prompt.ts forks compaction.prune(...).pipe(Effect.ignore), so this turns background pruning into a silent no-op and old sessions keep their large tool outputs indefinitely. Falling back to PRUNE_PROTECT/PRUNE_MINIMUM when lookup fails would preserve the previous behavior.

kilo-code-bot · 2026-04-27T14:07:39Z

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	2
SUGGESTION	0

Issue Details (click to expand)

CRITICAL

File	Line	Issue

WARNING

File	Line	Issue
`packages/opencode/src/session/compaction.ts`	39	Minimum pruning/recovery floors can still exceed the usable budget on very small-context models
`packages/opencode/src/session/compaction.ts`	200	Background pruning becomes a silent no-op if the original session model can no longer be resolved

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File	Line	Issue

Files Reviewed (3 files)

.changeset/model-aware-compaction.md - 0 issues
packages/opencode/src/session/compaction.ts - 2 issues
packages/opencode/test/session/compaction.test.ts - 0 issues

Fix these issues in Kilo Cloud

_{Reviewed by gpt-5.4-20260305 · 1,857,009 tokens}

chrarnoldus · 2026-04-28T08:18:22Z

Is this a workaround for Vercel's upload limit? FUNCTION_PAYLOAD_TOO_LARGE looks like a Vercel error

marius-kilocode added 2 commits April 27, 2026 12:42

fix(cli): scale compaction pruning by model budget

e3e61b3

fix(cli): annotate compaction changes

4357bc3

markijbema reviewed Apr 27, 2026

View reviewed changes

markijbema approved these changes Apr 27, 2026

View reviewed changes

fix(cli): shrink overflow replay text

9867607

kilo-code-bot Bot reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): scale compaction pruning by model budget#9557

fix(cli): scale compaction pruning by model budget#9557
marius-kilocode wants to merge 3 commits intomainfrom
fix/model-aware-compaction

marius-kilocode commented Apr 27, 2026 •

edited

Loading

Uh oh!

markijbema Apr 27, 2026

Uh oh!

markijbema left a comment

Uh oh!

marius-kilocode commented Apr 27, 2026

Uh oh!

kilo-code-bot Bot Apr 27, 2026

Uh oh!

kilo-code-bot Bot commented Apr 27, 2026 •

edited

Loading

WARNING

Uh oh!

kilo-code-bot Bot Apr 27, 2026

Uh oh!

kilo-code-bot Bot Apr 27, 2026

Uh oh!

kilo-code-bot Bot commented Apr 27, 2026 •

edited

Loading

CRITICAL

WARNING

Uh oh!

chrarnoldus commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

marius-kilocode commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design Decisions

Upstream Comparison

Validation

Local Reproduction Note

Uh oh!

markijbema Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema left a comment

Choose a reason for hiding this comment

Uh oh!

marius-kilocode commented Apr 27, 2026

Uh oh!

kilo-code-bot Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

Uh oh!

kilo-code-bot Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

CRITICAL

WARNING

Uh oh!

chrarnoldus commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marius-kilocode commented Apr 27, 2026 •

edited

Loading

kilo-code-bot Bot commented Apr 27, 2026 •

edited

Loading

kilo-code-bot Bot commented Apr 27, 2026 •

edited

Loading