Conversation
Implements a capability that intercepts tool return values via the after_tool_execute hook and truncates or summarizes large outputs to prevent context window blowup. Supports head, tail, and head+tail truncation strategies, per-tool limits and strategies, and an optional summarize_fn with truncation as a safety net. Closes #82 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…putManagement Addresses maintainer audit findings from PR #131: - Line-count limits: `max_output_lines` and `per_tool_line_limits` parameters truncate by line count. When both char and line limits are set, whichever fires first wins. - Spill-to-file: `spill_to_file` and `spill_dir` parameters. When enabled, oversized output is written to a temp file and the model receives a pointer plus a truncated preview. - Binary detection: `bytes`, `bytearray`, and `memoryview` results are immediately replaced with `[Binary data, N bytes]` instead of being stringified and truncated. 48 new tests (75 total for this capability, 81 in the full suite). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ally-uncalled fn - Add test for the case where both line and char limits are exceeded, triggering double truncation (line 209) - Add `# pragma: no cover` to summarize_fn that's intentionally never called in the under-limit test Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| """Return True if *text* exceeds either the char or line limit.""" | ||
| if len(text) > char_limit: | ||
| return True | ||
| if line_limit is not None and text.count('\n') + 1 > line_limit: |
There was a problem hiding this comment.
🟡 Line counting inconsistency between _exceeds_limits and _truncate_by_lines causes ineffective line truncation
_exceeds_limits at line 246 counts lines as text.count('\n') + 1, but _truncate_by_lines at line 107 counts lines as len(text.splitlines(keepends=True)). These methods disagree for any text with a trailing newline (extremely common for command/tool output). For example, "line1\nline2\nline3\n" gives 4 by count+1 but 3 by splitlines. This means _exceeds_limits can trigger the truncation path (thinking the text has more lines than allowed), but _truncate_by_lines then returns the text unchanged (since by its counting, the limit isn't exceeded). The same inconsistency exists in _apply_truncation at line 255 which also uses count('\n') + 1.
Concrete example showing the bug
With max_output_lines=3 and text "line1\nline2\nline3\n" (3 content lines + trailing newline):
_exceeds_limits:count('\n') + 1 = 4 > 3→ True, enters truncation_truncate_by_lines(text, 3, ...):splitlinesgives 3 elements →3 <= 3→ returns text unchanged- Text passes through un-truncated despite
_exceeds_limitsflagging it
Prompt for agents
The line counting method in _exceeds_limits (and _apply_truncation at line 255) uses text.count('\n') + 1, while _truncate_by_lines uses len(text.splitlines(keepends=True)). These give different results for text with trailing newlines: count+1 produces a higher number. To fix the inconsistency, both should use the same method. The simplest fix is to change _exceeds_limits and _apply_truncation to also use len(text.splitlines(keepends=True)), or extract a shared _count_lines(text) helper used by all three locations (lines 246, 255, and 107). Alternatively, if the intent is that a trailing newline should count as an extra line, then _truncate_by_lines should also use count+1 based counting.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if not self._exceeds_limits(text, char_limit, line_limit): | ||
| # If we stripped ANSI, return the cleaned text so the model | ||
| # never sees escape codes. Otherwise return the original value. | ||
| return text if stripped else result |
There was a problem hiding this comment.
🚩 strip_ansi=True (default) converts non-string results to strings even when under the limit
When strip_ansi is True (the default) and the result is a non-string type (e.g., a dict or list) that's under the size limit, after_tool_execute returns text (the stringified version) at line 309 rather than the original result. This means {'key': 'value'} becomes the string "{'key': 'value'}". When strip_ansi=False, the original object is preserved (return result). This type coercion is arguably correct since the model always receives strings, but it's a subtle behavioral difference: other after_tool_execute hooks downstream would see a string instead of the original type. The framework's own default implementation returns the original result unchanged, so this is a semantic change for non-string results even when no ANSI codes are present.
Was this helpful? React with 👍 or 👎 to provide feedback.
Audit vs prior art: ToolOutputManagementWorth adding now:
Follow-up opportunities:
|
Summary
ToolOutputManagementcapability that intercepts tool return values viaafter_tool_executeand truncates/summarizes large outputs to prevent context window blowuphead,tail,head_tail), per-tool limits and strategies, and an optionalsummarize_fnwith truncation as safety netToolOutputManagementandTruncationStrategyfrompydantic_harnessCloses #82
Design decisions
Per maintainer feedback, this is implemented as a configurable capability using the existing
after_tool_executehook, rather than as framework infrastructure baked into the agent loop.The capability deliberately does not implement spill-to-file, token-based limits, or model-aware scaling (see
PLAN.mdfor rationale on each).Test plan
_stringify,_truncate,_head_tail_default_splithelpersafter_tool_executecovering all strategies, per-tool overrides, sync/async summarize_fn, safety net truncation, boundary conditionsAI generated code
🤖 Generated with Claude Code