Conversation
Tools with `kind='external'` or `'unapproved'` (and tools that raise ApprovalRequired/CallDeferred at runtime) are no longer excluded from the sandbox and promoted back to native tools. They now take the normal sandboxed path, and a HandleDeferredToolCalls capability on the agent can resolve them inline — so the model sees the resolved return value instead of having the deferral bounce out as a separate native tool call. - Remove the td.defer filter in _partition_callable_tools (no more native fallback for deferred tools). - Drop the native_fallbacks return value and the corresponding deferred-tool warning. - Update the sandbox UserError message when no handler is configured to point users at HandleDeferredToolCalls. - Update the deferred_execution test to assert sandbox inclusion and the approval-retry test to match the new error message. Depends on pydantic/pydantic-ai#5142 landing and being released; once it does, bump the pydantic-ai-slim lower bound.
There was a problem hiding this comment.
🚩 Class docstring now describes behavior the PR removed
The CodeModeToolset class docstring at pydantic_ai_harness/code_mode/_toolset.py:170-171 still says "Tools that require deferred execution (kind external/unapproved) cannot be called from inside the sandbox and are dropped with a one-time UserWarning." This is now factually incorrect — the entire point of this PR is to sandbox those tools instead. The docstring was not updated because it falls in unchanged context lines, but it will be misleading to anyone reading the class documentation.
Was this helpful? React with 👍 or 👎 to provide feedback.
…er denials When the `HandleDeferredToolCalls` handler denies a tool call, `handle_call` now raises `ToolDeniedError` (on pydantic-ai-slim once released) instead of returning the denial message as a plain string. CodeMode catches it, records a `ToolReturnPart(outcome='denied')` in `nested_returns` so message history reflects the denial correctly, and re-raises so the sandbox surfaces the denial as an exception rather than as what would look like a successful tool return. The `ToolDeniedError` import is gated behind a compat shim so this module still loads against the currently released pydantic-ai-slim (which lacks the exception); the shim resolves to a placeholder class that never matches a real exception, leaving the except clause inert until a release ships `ToolDeniedError`. Depends on pydantic/pydantic-ai#5142.
There was a problem hiding this comment.
🟡 Stale docstring claims deferred-execution tools are excluded from sandbox
The CodeModeToolset class docstring at lines 188-189 states: "Tools that require deferred execution (kind external/unapproved) cannot be called from inside the sandbox and are dropped with a one-time UserWarning." This PR specifically removes that behavior — deferred-execution tools are now sandboxed like any other tool. The docstring was not updated to match, leaving incorrect documentation that contradicts the implementation.
Was this helpful? React with 👍 or 👎 to provide feedback.
`ToolManager.handle_call` no longer raises a (now-removed) `ToolDeniedError` on handler denial — it returns the `ToolDenied` value the handler produced. Drop the compat shim, import `ToolDenied` directly, and switch the dispatch to inspect the return value: record the denial as `outcome='denied'` on the nested `ToolReturnPart` and raise a `RuntimeError` inside the sandbox so the script can't mistake the denial message for a regular string return.
Now that the slim PR has merged to main, refresh the lockfile to pick up
the `HandleDeferredToolCalls` capability and `handle_call`'s `ToolDenied`
return value. Add a denial test that asserts the denied-call flow
surfaces as `ModelRetry` with the original denial message preserved in
the trace.
Notes on the test:
- The handler returns `ToolDenied('nope')`; the harness records
`outcome='denied'` on the nested `ToolReturnPart` and raises
`RuntimeError` inside the sandbox.
- The script doesn't catch the RuntimeError, so Monty surfaces it as
`MontyRuntimeError`, which the harness converts back to `ModelRetry`.
The retry message preserves the denial message so the model knows
what went wrong.
The default `test` matrix uses the `[tool.uv.sources]` override pinning slim to its main branch, so it never exercises the published-PyPI install path. Add a `test-floor` job that overrides slim to the lowest version declared in `pyproject.toml` (>=1.80.0) and runs the test suite, so we catch any accidental dependency on unreleased slim features in code paths that should be backward-compatible. Gate the new HandleDeferredToolCalls denial test with `pytest.skip` when the capability isn't importable — currently the only test that requires a post-1.80.0 slim, but the pattern can be reused if more land later.
HandleDeferredToolCalls
The `except ImportError → pytest.skip` branch only fires when running against the slim floor (1.80.0) where `HandleDeferredToolCalls` doesn't exist yet. The default test matrix runs against slim main, so coverage counted those two lines as uncovered. Mark the branch `# pragma: no cover` since it's an explicit skip path that the floor-slim CI job exercises but isn't included in the coverage report (the floor job doesn't gate on coverage by design).
Summary
td.deferexclusion so external/approval-required tools stay in the CodeMode sandbox instead of bouncing out as native toolsUserErrorat theHandleDeferredToolCallscapability so users know how to resolve deferrals inlinenative_fallbacksreturn value and deferred-tool warningBackground
Depends on pydantic/pydantic-ai#5142, which adds the
HandleDeferredToolCallscapability. Before that PR, any tool that raisedApprovalRequired/CallDeferredinside CodeMode had to bubble out — there was no inline resolver, so the sandbox intentionally hid those tools and surfaced them as native tools instead. With a handler capability, the inline flow works, so the hide-and-promote workaround is no longer needed.The inline positive test was removed from this PR because the types it references don't exist in any released pydantic-ai-slim yet, which would break pyright. Once a version with
HandleDeferredToolCallsships, we should:pydantic-ai-slim >=to that version inpyproject.tomlHandleDeferredToolCallshandler, returning the tool's value to the sandbox)CI on this PR will go red until that release lands — opening now so we don't forget the follow-up.
Test plan
capabilities=[CodeMode[None](), HandleDeferredToolCalls(handler=...)]lets a tool that raisesApprovalRequiredresolve inside the sandbox (no native fallback, handler approves, tool returns value to the model)🤖 Generated with Claude Code