Skip to content

feat: add built-in repair_orphaned_tool_parts history processor#5090

Closed
anmolg1997 wants to merge 6 commits intopydantic:mainfrom
anmolg1997:feat/repair-orphaned-tool-parts
Closed

feat: add built-in repair_orphaned_tool_parts history processor#5090
anmolg1997 wants to merge 6 commits intopydantic:mainfrom
anmolg1997:feat/repair-orphaned-tool-parts

Conversation

@anmolg1997
Copy link
Copy Markdown

@anmolg1997 anmolg1997 commented Apr 15, 2026

Summary

  • Adds pydantic_ai.history_processors.repair_orphaned_tool_parts, a ready-to-use history processor that removes structurally invalid tool call/return pairs from message history
  • Prevents 400 errors from providers (especially Anthropic) that reject orphaned tool references after streaming timeouts, deferred tool drops, or history trimming
  • Ships with 11 unit tests covering all edge cases

Problem

Multi-turn conversations with tools can accumulate broken message history: tool calls without matching results, or results referencing calls that don't exist. This is structurally invalid (a tool call without a result before the next turn doesn't make sense), and providers rightfully reject it. Anthropic is the strictest about enforcement (you get a 400), but even providers that silently tolerate it produce worse results from garbled history.

Common causes:

  • Streaming response with ToolCallPart gets persisted, but agent times out before ToolReturnPart arrives
  • Deferred tools get their results dropped or arrive with a different tool_call_id
  • History processors (trimming, summarization) remove one side of a pair

Approach

Two-pass repair:

  1. Orphaned returns/retries: ToolReturnPart or RetryPromptPart (with tool_name) whose tool_call_id has no matching ToolCallPart → removed
  2. Orphaned calls: ToolCallPart whose tool_call_id has no matching ToolReturnPart or RetryPromptPart → removed

Output-validation RetryPromptParts (tool_name=None) are preserved since they're not tied to tool calls.

Empty messages (all parts removed) are dropped entirely.

Usage

from pydantic_ai import Agent
from pydantic_ai.history_processors import repair_orphaned_tool_parts

agent = Agent('openai:gpt-4o', history_processors=[repair_orphaned_tool_parts])

Changes

File Change
pydantic_ai_slim/pydantic_ai/history_processors.py New module with repair_orphaned_tool_parts function
tests/test_history_processors.py 11 tests covering matched pairs, orphaned calls, orphaned returns, mixed, empty history, text-only, output validation retries

Test plan

  • 11/11 new tests passing
  • No existing tests affected
  • Handles edge cases: empty history, text-only conversations, output validation retries (tool_name=None), parallel tool calls, mixed orphans

Closes #4728

The judge agents' system prompts now explicitly instruct the model to
keep the `reason` field to a concise 1-2 sentence summary, preventing
reasoning/thinking text from leaking into the public reason. The
GradingOutput.reason field also gains a description that reinforces
this constraint via the JSON schema.

This makes `reason` stable and suitable for use in ModelRetry feedback
loops, where verbose or self-contradictory reasoning text would
otherwise degrade retry quality.

Fixes pydantic#5034
Adds a ready-to-use history processor that removes structurally invalid
tool call/return pairs from message history. This prevents 400 errors
from providers (especially Anthropic) that reject orphaned tool
references after streaming timeouts, deferred tool drops, or history
trimming.

Two-pass repair:
1. Remove ToolReturnPart/RetryPromptPart whose tool_call_id has no
   matching ToolCallPart
2. Remove ToolCallPart whose tool_call_id has no matching return

Output-validation RetryPromptParts (tool_name=None) are preserved since
they are not tied to tool calls.

Closes pydantic#4728
@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Apr 15, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread pydantic_evals/pydantic_evals/evaluators/llm_as_a_judge.py
Comment thread pydantic_ai_slim/pydantic_ai/history_processors.py Outdated
- Remove unused `import pytest` from test file
- Fix formatting (pre-commit auto-format compliance)
- Remove redundant `tool_call_id` truthiness guards (always set by default)
- Add `pragma: no branch` for exhaustive ModelMessage union branch
- Achieves 100% branch coverage
Split repair_orphaned_tool_parts into focused helpers:
- _collect_tool_call_ids / _collect_tool_return_ids
- _is_orphaned_request_part
- _repair_request / _repair_response
- _rebuild_or_drop

All 11 tests pass, 100% branch coverage, ruff clean.
- Fix "possibly unbound" by using if/else instead of if/elif for
  exhaustive ModelMessage union
- Fix list invariance errors by inlining rebuild logic into
  _repair_request and _repair_response with proper return types
- Remove _rebuild_or_drop helper and its type: ignore comment

Locally verified: pyright 0 errors, ruff clean, 100% branch coverage.
@adtyavrdhn
Copy link
Copy Markdown
Member

Hey @anmolg1997

Thanks for this though with the changes in place we are aiming to introduce this as a capability in the harness: pydantic/pydantic-ai-harness#184

Feel free to discuss this with Douwe there :)

Closing this for now.

@adtyavrdhn adtyavrdhn closed this Apr 15, 2026
@anmolg1997
Copy link
Copy Markdown
Author

anmolg1997 commented Apr 17, 2026

Thanks @adtyavrdhn and @DouweM, makes sense to land this as a harness capability (ToolOrphanRepair) rather than in core. The harness is a better home for opt-in message sanitization.

For anyone finding this later: pydantic/pydantic-ai-harness#184 is the canonical impl. Same orphaning scenarios (orphaned calls -> synthetic returns, orphaned returns/retries -> strip), plus edge cases I hadn't covered (trailing responses, empty requests after repair, built-in tool parts).

Closing in favor of the harness PR. Happy to help review there if useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Built-in HistoryProcessor for orphaned tool call/result repair

2 participants