feat(code_mode): resolve deferred/approval-required tool calls via `HandleDeferredToolCalls` by DouweM · Pull Request #220 · pydantic/pydantic-ai-harness

DouweM · 2026-04-24T20:01:20Z

Summary

Remove the td.defer exclusion so external/approval-required tools stay in the CodeMode sandbox instead of bouncing out as native tools
Point the sandbox UserError at the HandleDeferredToolCalls capability so users know how to resolve deferrals inline
Drop the now-unused native_fallbacks return value and deferred-tool warning
Update the deferred-tools tests: the old test expected promotion to native, the new one asserts the tool is sandboxed and the error message is updated

Background

Depends on pydantic/pydantic-ai#5142, which adds the HandleDeferredToolCalls capability. Before that PR, any tool that raised ApprovalRequired/CallDeferred inside CodeMode had to bubble out — there was no inline resolver, so the sandbox intentionally hid those tools and surfaced them as native tools instead. With a handler capability, the inline flow works, so the hide-and-promote workaround is no longer needed.

The inline positive test was removed from this PR because the types it references don't exist in any released pydantic-ai-slim yet, which would break pyright. Once a version with HandleDeferredToolCalls ships, we should:

Bump pydantic-ai-slim >= to that version in pyproject.toml
Add back an inline-resolution test (an approval-required tool sandboxed inside CodeMode, resolved by a HandleDeferredToolCalls handler, returning the tool's value to the sandbox)

CI on this PR will go red until that release lands — opening now so we don't forget the follow-up.

Test plan

CI passes locally once the pydantic-ai-slim release ships
Manually verify an agent with capabilities=[CodeMode[None](), HandleDeferredToolCalls(handler=...)] lets a tool that raises ApprovalRequired resolve inside the sandbox (no native fallback, handler approves, tool returns value to the model)

🤖 Generated with Claude Code

Tools with `kind='external'` or `'unapproved'` (and tools that raise ApprovalRequired/CallDeferred at runtime) are no longer excluded from the sandbox and promoted back to native tools. They now take the normal sandboxed path, and a HandleDeferredToolCalls capability on the agent can resolve them inline — so the model sees the resolved return value instead of having the deferral bounce out as a separate native tool call. - Remove the td.defer filter in _partition_callable_tools (no more native fallback for deferred tools). - Drop the native_fallbacks return value and the corresponding deferred-tool warning. - Update the sandbox UserError message when no handler is configured to point users at HandleDeferredToolCalls. - Update the deferred_execution test to assert sandbox inclusion and the approval-retry test to match the new error message. Depends on pydantic/pydantic-ai#5142 landing and being released; once it does, bump the pydantic-ai-slim lower bound.

devin-ai-integration

Devin Review found 1 potential issue.

View 1 additional finding in Devin Review.

devin-ai-integration · 2026-04-24T20:05:46Z

🚩 Class docstring now describes behavior the PR removed

The CodeModeToolset class docstring at pydantic_ai_harness/code_mode/_toolset.py:170-171 still says "Tools that require deferred execution (kind external/unapproved) cannot be called from inside the sandbox and are dropped with a one-time UserWarning." This is now factually incorrect — the entire point of this PR is to sandbox those tools instead. The docstring was not updated because it falls in unchanged context lines, but it will be misleading to anyone reading the class documentation.

Was this helpful? React with 👍 or 👎 to provide feedback.

…er denials When the `HandleDeferredToolCalls` handler denies a tool call, `handle_call` now raises `ToolDeniedError` (on pydantic-ai-slim once released) instead of returning the denial message as a plain string. CodeMode catches it, records a `ToolReturnPart(outcome='denied')` in `nested_returns` so message history reflects the denial correctly, and re-raises so the sandbox surfaces the denial as an exception rather than as what would look like a successful tool return. The `ToolDeniedError` import is gated behind a compat shim so this module still loads against the currently released pydantic-ai-slim (which lacks the exception); the shim resolves to a placeholder class that never matches a real exception, leaving the except clause inert until a release ships `ToolDeniedError`. Depends on pydantic/pydantic-ai#5142.

devin-ai-integration

Devin Review found 3 new potential issues.

View 2 additional findings in Devin Review.

devin-ai-integration · 2026-04-24T21:45:20Z

🟡 Stale docstring claims deferred-execution tools are excluded from sandbox

The CodeModeToolset class docstring at lines 188-189 states: "Tools that require deferred execution (kind external/unapproved) cannot be called from inside the sandbox and are dropped with a one-time UserWarning." This PR specifically removes that behavior — deferred-execution tools are now sandboxed like any other tool. The docstring was not updated to match, leaving incorrect documentation that contradicts the implementation.

Was this helpful? React with 👍 or 👎 to provide feedback.

`ToolManager.handle_call` no longer raises a (now-removed) `ToolDeniedError` on handler denial — it returns the `ToolDenied` value the handler produced. Drop the compat shim, import `ToolDenied` directly, and switch the dispatch to inspect the return value: record the denial as `outcome='denied'` on the nested `ToolReturnPart` and raise a `RuntimeError` inside the sandbox so the script can't mistake the denial message for a regular string return.

Now that the slim PR has merged to main, refresh the lockfile to pick up the `HandleDeferredToolCalls` capability and `handle_call`'s `ToolDenied` return value. Add a denial test that asserts the denied-call flow surfaces as `ModelRetry` with the original denial message preserved in the trace. Notes on the test: - The handler returns `ToolDenied('nope')`; the harness records `outcome='denied'` on the nested `ToolReturnPart` and raises `RuntimeError` inside the sandbox. - The script doesn't catch the RuntimeError, so Monty surfaces it as `MontyRuntimeError`, which the harness converts back to `ModelRetry`. The retry message preserves the denial message so the model knows what went wrong.

devin-ai-integration

Devin Review found 0 new potential issues.

View 5 additional findings in Devin Review.

The default `test` matrix uses the `[tool.uv.sources]` override pinning slim to its main branch, so it never exercises the published-PyPI install path. Add a `test-floor` job that overrides slim to the lowest version declared in `pyproject.toml` (>=1.80.0) and runs the test suite, so we catch any accidental dependency on unreleased slim features in code paths that should be backward-compatible. Gate the new HandleDeferredToolCalls denial test with `pytest.skip` when the capability isn't importable — currently the only test that requires a post-1.80.0 slim, but the pattern can be reused if more land later.

devin-ai-integration

Devin Review found 0 new potential issues.

View 6 additional findings in Devin Review.

The `except ImportError → pytest.skip` branch only fires when running against the slim floor (1.80.0) where `HandleDeferredToolCalls` doesn't exist yet. The default test matrix runs against slim main, so coverage counted those two lines as uncovered. Mark the branch `# pragma: no cover` since it's an explicit skip path that the floor-slim CI job exercises but isn't included in the coverage report (the floor job doesn't gate on coverage by design).

devin-ai-integration

Devin Review found 0 new potential issues.

View 8 additional findings in Devin Review.

devin-ai-integration Bot reviewed Apr 24, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

devin-ai-integration Bot reviewed Apr 25, 2026

View reviewed changes

DouweM changed the title ~~feat(code_mode): resolve deferred tool calls via HandleDeferredToolCalls~~ feat(code_mode): resolve deferred/approval-required tool calls via HandleDeferredToolCalls Apr 25, 2026

DouweM merged commit fe9a587 into main Apr 25, 2026
19 of 20 checks passed

devin-ai-integration Bot reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(code_mode): resolve deferred/approval-required tool calls via `HandleDeferredToolCalls`#220

feat(code_mode): resolve deferred/approval-required tool calls via `HandleDeferredToolCalls`#220
DouweM merged 6 commits intomainfrom
handle-deferred-in-code-mode

DouweM commented Apr 24, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 24, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DouweM commented Apr 24, 2026

Summary

Background

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration Bot Apr 24, 2026 •

edited

Loading

devin-ai-integration Bot Apr 24, 2026 •

edited

Loading