Skip to content

Add ToolOrphanRepair capability#184

Open
DouweM wants to merge 4 commits intomainfrom
capability/tool-orphan-repair
Open

Add ToolOrphanRepair capability#184
DouweM wants to merge 4 commits intomainfrom
capability/tool-orphan-repair

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented Apr 10, 2026

Summary

  • Implements ToolOrphanRepair, a capability that sanitizes message history to fix orphaned tool calls and results before each model request
  • Injects synthetic ToolReturnPart / BuiltinToolReturnPart for calls without matching results, strips ToolReturnPart / RetryPromptPart whose tool_call_id doesn't match any call, and handles trailing responses and empty request edge cases
  • Exports as from pydantic_harness import ToolOrphanRepair

Refs: pydantic/pydantic-ai#4728

Test plan

  • 25 tests covering all repair scenarios: orphaned calls, orphaned returns, orphaned builtin calls, trailing responses, empty requests, warnings, multi-turn conversations, and no-op passthrough
  • pyright strict mode: 0 errors
  • ruff lint + format: clean

🤖 Generated with Claude Code

DouweM and others added 4 commits April 2, 2026 05:27
…sults

Implements a capability that hooks into before_model_request to repair
structurally invalid message history caused by orphaned tool calls and
results in multi-turn conversations. This prevents providers (especially
Anthropic) from rejecting poisoned conversation history with 400 errors.

Repairs: orphaned ToolCallPart (injects synthetic ToolReturnPart),
orphaned BuiltinToolCallPart (injects BuiltinToolReturnPart in same
response), and orphaned ToolReturnPart/RetryPromptPart (strips them).
Also handles trailing responses and empty request edge cases.

Refs: pydantic/pydantic-ai#4728

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each repair site now emits a `logging.debug()` message describing the
specific action taken (synthetic return injected, orphaned return
stripped, trailing response dropped, etc.), complementing the existing
summary UserWarning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onses

- Test the `before_model_request` capability hook directly
- Test consecutive ModelResponse messages (no interleaved request)
- Mark defensive Phase 6 code as `# pragma: no cover` (unreachable)
- Mark unused `extra_parts` helper branch as `# pragma: no cover`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #132 comment (PR was recreated)

Audit vs prior art: ToolOrphanRepair

Worth adding now:

  • Duplicate tool_call_id deduplication
  • Debug-level logging of specific repairs (not just count)

Follow-up opportunities:

  • Integration with Compaction to catch orphans created by summarization

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @adtyavrdhn in #132 comment (PR was recreated)

Notes from comparing with Hermes, Pi-mono, and Mastra

Looked at how other frameworks handle the same problem. A few things worth noting:

Synthetic returns should be marked as errors
Pi-mono marks injected results with isError: true so the model knows the tool didn't actually succeed — it's not a normal result, it's a "this never ran" signal. We can't do this yet because ToolReturnPart doesn't have an is_error field. That's tracked in pydantic/pydantic-ai#4363. Once that lands, the synthetic returns here should set is_error=True.

Duplicate tool_call_id handling
Already flagged in the audit comment. The current set-based matching means if two calls share an ID (provider bug, frontend-generated IDs), only one synthetic return gets created and the other call stays orphaned. Worth adding detection + a warning at minimum.

Tool ID sanitization is not this PR's job
Hermes has an explicit _sanitize_tool_id() that replaces non-[a-zA-Z0-9_-] chars for Anthropic compliance. That's a provider adapter concern — belongs in pydantic-ai's Anthropic model, not in history repair. Mentioning it here just because it causes the same symptom (Anthropic 400s).

Errored/aborted turns — framework handles it (mostly)
Pi-mono skips entire responses with stopReason === "error". Pydantic-ai already handles finish_reason='length' (raises IncompleteToolCall) and auto-generates tool call IDs if missing, so partial tool calls aren't a concern for this capability. finish_reason='error' responses do silently enter history though — that's a framework-level gap, not something for here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants