Skip to content

fix: add context window overflow protection#27

Open
efecnc wants to merge 2 commits intoaltaidevorg:mainfrom
efecnc:fix/context-window-overflow-protection
Open

fix: add context window overflow protection#27
efecnc wants to merge 2 commits intoaltaidevorg:mainfrom
efecnc:fix/context-window-overflow-protection

Conversation

@efecnc
Copy link
Copy Markdown
Contributor

@efecnc efecnc commented May 6, 2026

Summary

  • Prevents 400 "maximum context length exceeded" errors on long conversations
  • Trims oldest messages from context before calling the provider, preserving tool call/response pairs atomically
  • Inserts a marker message so the model knows context was trimmed
  • Default budget: 120k tokens (conservative for most models)

Root Cause

The reasoning loop fetched full context from memory and sent it directly to the provider with no size check. The existing compaction logic only triggered after a successful response, creating a deadlock: context too large → 400 error → no response → no compaction → permanent failure.

Test plan

  • Verify cargo check passes (no new warnings)
  • Test with a very long conversation (100+ tool calls) — should trim and continue working
  • Verify trimmed conversations still produce coherent responses
  • Verify short conversations are unaffected

Long conversations could exceed the model's context window, causing an
immediate 400 error with no recovery path (compaction only ran after
successful responses, creating a deadlock).

Adds trim_context_to_budget() which estimates token count and removes
oldest messages from the front when over budget, preserving:
- The system message (index 0)
- Tool call/response pairs (removed atomically)

A marker message is inserted so the model knows earlier context was
trimmed. Default budget is 120k tokens (conservative for most models).
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a context trimming mechanism to ensure chat history stays within a defined token budget. It adds a trim_context_to_budget function that prunes older messages while preserving the system message and keeping tool call/response pairs intact. A review comment suggests optimizing this function to avoid O(N^2) complexity by updating the token estimate incrementally instead of recalculating it in every iteration of the trimming loop.

Comment thread src/agent/mod.rs Outdated
Comment on lines +110 to +119
let estimate_tokens = |msgs: &[crate::utils::ChatMessage]| -> usize {
msgs.iter()
.map(|m| {
m.content.as_ref().map_or(0, |c| c.text_content().len()) / 4
+ m.tool_calls
.as_ref()
.map_or(0, |tcs| tcs.iter().map(|t| t.function.arguments.len() / 4).sum())
})
.sum()
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The estimate_tokens closure is re-calculated in every iteration of the while loop, leading to O(N^2) complexity. While acceptable for small contexts, it could be optimized by calculating the initial token count and updating it incrementally as messages are removed.

@efecnc
Copy link
Copy Markdown
Contributor Author

efecnc commented May 6, 2026

Review fixes applied:

  • O(N²) → O(N): Token estimate tracked incrementally via remaining variable instead of re-scanning entire context each iteration
  • Marker bug fixed: trim_end > 1 || context.len() > 2 (always true for 3+ messages) → trimmed boolean, only inserts when actually removed something
  • Dead variable removed: trim_end (never changed from 1)
  • All agent::tests pass (9/9)

Ready for re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant