Add ToolOutputManagement capability by DouweM · Pull Request #185 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T01:02:52Z

Summary

Adds ToolOutputManagement capability that intercepts tool return values via after_tool_execute and truncates/summarizes large outputs to prevent context window blowup
Supports three truncation strategies (head, tail, head_tail), per-tool limits and strategies, and an optional summarize_fn with truncation as safety net
Exports ToolOutputManagement and TruncationStrategy from pydantic_harness

Closes #82

Design decisions

Per maintainer feedback, this is implemented as a configurable capability using the existing after_tool_execute hook, rather than as framework infrastructure baked into the agent loop.

The capability deliberately does not implement spill-to-file, token-based limits, or model-aware scaling (see PLAN.md for rationale on each).

Test plan

Unit tests for _stringify, _truncate, _head_tail_default_split helpers
Integration tests for after_tool_execute covering all strategies, per-tool overrides, sync/async summarize_fn, safety net truncation, boundary conditions
Export tests
27 tests passing, ruff clean, pyright strict clean

AI generated code

Check this box to confirm this PR contains AI-generated code

🤖 Generated with Claude Code

Implements a capability that intercepts tool return values via the after_tool_execute hook and truncates or summarizes large outputs to prevent context window blowup. Supports head, tail, and head+tail truncation strategies, per-tool limits and strategies, and an optional summarize_fn with truncation as a safety net. Closes #82 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…putManagement Addresses maintainer audit findings from PR #131: - Line-count limits: `max_output_lines` and `per_tool_line_limits` parameters truncate by line count. When both char and line limits are set, whichever fires first wins. - Spill-to-file: `spill_to_file` and `spill_dir` parameters. When enabled, oversized output is written to a temp file and the model receives a pointer plus a truncated preview. - Binary detection: `bytes`, `bytearray`, and `memoryview` results are immediately replaced with `[Binary data, N bytes]` instead of being stringified and truncated. 48 new tests (75 total for this capability, 81 in the full suite). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ally-uncalled fn - Add test for the case where both line and char limits are exceeded, triggering double truncation (line 209) - Add `# pragma: no cover` to summarize_fn that's intentionally never called in the under-limit test Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T01:08:51Z

+        """Return True if *text* exceeds either the char or line limit."""
+        if len(text) > char_limit:
+            return True
+        if line_limit is not None and text.count('\n') + 1 > line_limit:


🟡 Line counting inconsistency between _exceeds_limits and _truncate_by_lines causes ineffective line truncation

_exceeds_limits at line 246 counts lines as text.count('\n') + 1, but _truncate_by_lines at line 107 counts lines as len(text.splitlines(keepends=True)). These methods disagree for any text with a trailing newline (extremely common for command/tool output). For example, "line1\nline2\nline3\n" gives 4 by count+1 but 3 by splitlines. This means _exceeds_limits can trigger the truncation path (thinking the text has more lines than allowed), but _truncate_by_lines then returns the text unchanged (since by its counting, the limit isn't exceeded). The same inconsistency exists in _apply_truncation at line 255 which also uses count('\n') + 1.

Concrete example showing the bug

With max_output_lines=3 and text "line1\nline2\nline3\n" (3 content lines + trailing newline):

_exceeds_limits: count('\n') + 1 = 4 > 3 → True, enters truncation

_truncate_by_lines(text, 3, ...): splitlines gives 3 elements → 3 <= 3 → returns text unchanged

Text passes through un-truncated despite _exceeds_limits flagging it

Prompt for agents

The line counting method in _exceeds_limits (and _apply_truncation at line 255) uses text.count('\n') + 1, while _truncate_by_lines uses len(text.splitlines(keepends=True)). These give different results for text with trailing newlines: count+1 produces a higher number. To fix the inconsistency, both should use the same method. The simplest fix is to change _exceeds_limits and _apply_truncation to also use len(text.splitlines(keepends=True)), or extract a shared _count_lines(text) helper used by all three locations (lines 246, 255, and 107). Alternatively, if the intent is that a trailing newline should count as an extra line, then _truncate_by_lines should also use count+1 based counting.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-10T01:08:54Z

+        if not self._exceeds_limits(text, char_limit, line_limit):
+            # If we stripped ANSI, return the cleaned text so the model
+            # never sees escape codes.  Otherwise return the original value.
+            return text if stripped else result


🚩 strip_ansi=True (default) converts non-string results to strings even when under the limit

When strip_ansi is True (the default) and the result is a non-string type (e.g., a dict or list) that's under the size limit, after_tool_execute returns text (the stringified version) at line 309 rather than the original result. This means {'key': 'value'} becomes the string "{'key': 'value'}". When strip_ansi=False, the original object is preserved (return result). This type coercion is arguably correct since the model always receives strings, but it's a subtle behavioral difference: other after_tool_execute hooks downstream would see a string instead of the original type. The framework's own default implementation returns the original result unchanged, so this is a semantic change for non-string results even when no ANSI codes are present.

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM · 2026-04-10T15:09:31Z

Originally posted by @DouweM in #131 comment (PR was recreated)

Audit vs prior art: ToolOutputManagement

Worth adding now:

max_output_lines alongside max_output_chars (Claude Code truncates by lines)
Spill-to-file strategy: write full output to temp file, return path + summary
Binary detection: skip truncation for non-text results

Follow-up opportunities:

Token-based limits when context window tracking exists
Model-aware scaling

DouweM and others added 4 commits April 2, 2026 05:28

adding ansi escape and comments on missing bits

62f28f5

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 01:02

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

DouweM assigned adtyavrdhn Apr 10, 2026

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:12

jeffss00 mentioned this pull request Apr 23, 2026

Tool Output Management (truncation, spill-to-file, per-tool summarization) #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ToolOutputManagement capability#185

Add ToolOutputManagement capability#185
DouweM wants to merge 4 commits intomainfrom
capability/tool-output-management

DouweM commented Apr 10, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DouweM commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Test plan

AI generated code

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: ToolOutputManagement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DouweM commented Apr 10, 2026 •

edited

Loading