Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard by DouweM · Pull Request #182 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T01:02:42Z

Summary

InputGuardrail: validates user input before the run starts via before_run hook, with a user-supplied sync/async guard function
OutputGuardrail: validates model output after the run completes via after_run hook, with a user-supplied sync/async guard function
CostGuard: enforces token budget limits (input, output, total) before each model request via before_model_request hook
ToolGuard: blocks tools from the model via prepare_tools and requires approval before execution via before_tool_execute hook

All four are AbstractCapability subclasses following pydantic-ai patterns. Exception hierarchy: GuardrailError base with InputBlocked, OutputBlocked, BudgetExceededError, ToolBlocked subclasses.

Design inspired by vstorm-co/pydantic-ai-shields and OpenAI Agents SDK guardrails.

See PLAN.md for design decisions and future work.

Closes #28

Test plan

38 tests covering all capabilities, exception hierarchy, sync/async guards, composition, and imports
ruff check and ruff format pass
pyright strict mode passes on src/ and tests/
CI passes

🤖 Generated with Claude Code

…d, ToolGuard Implement four AbstractCapability subclasses for common safety and cost-control concerns, with 38 passing tests covering all code paths: - InputGuardrail: validates user input via before_run hook - OutputGuardrail: validates model output via after_run hook - CostGuard: enforces token budget limits via before_model_request hook - ToolGuard: blocks tools (prepare_tools) and requires approval (before_tool_execute) Closes #28 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…eve 100% coverage - Keep main's __init__.py structure (docstring, comments) + add guardrail exports alphabetically - Use main's pyproject.toml (pydantic-ai-slim>=1.76.0 from PyPI, no git sources) - Restrict guardrails tests to asyncio backend (pydantic-ai uses asyncio.gather internally) - Add test for None prompt path in InputGuardrail.before_run - Add pragma: no cover for intentionally-uncalled guard functions in tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add AsyncGuardrail that uses wrap_model_request to run a guard function alongside model calls. Three execution modes: concurrent (guard + model in parallel, cancel model on guard failure), blocking (guard before model), and monitoring (guard after model, log-only). Also adds warn mode and context_guard (RunContext access) to InputGuardrail and OutputGuardrail, plus GuardrailResult data class and GuardrailFailed exception. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…rst path - Add tests for async context_guard (covers _call_context_guard await path) - Replace flaky concurrent test with deterministic sleep-based tests that reliably exercise the model-finishes-first branch (lines 602-606) - Add both pass and fail variants of model-finishes-first scenario Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Adds _is_awaitable_bool and _is_context_async_guard TypeGuard functions to replace the scattered inspect.isawaitable calls, reducing # type: ignore comments from 5 to 4 (3 for the unavoidable negative-branch narrowing limitation + 1 for the 1-arg async guard branch). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…ple guardrails, async warn mode Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…d warn mode Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…uardrail context modes, exception propagation Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Five runnable examples: prompt injection detection, secret leakage prevention, cost budget enforcement, tool approval workflow, and async tripwire guardrail. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… OpenAI provider Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

devin-ai-integration

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T01:08:11Z

+                try:
+                    await model_task
+                except asyncio.CancelledError:
+                    pass


🔴 Concurrent mode: model exception masks guard failure when both tasks complete

In _run_concurrent, when the guard task returns GuardrailResult(passed=False) and the model task has also completed with an exception (e.g., network error, timeout, rate limit), the code at line 604 does await model_task after model_task.cancel(). Since the model task already completed, cancel() is a no-op, and await model_task re-raises the model's exception. The except asyncio.CancelledError on line 605 does not catch non-cancellation exceptions, so the model's exception propagates instead of the intended GuardrailFailed. This means a security-relevant guard failure is silently replaced by a model error.

Reproduction scenario

Both tasks finish before asyncio.wait returns. Guard returns passed=False. Model raises RuntimeError. The except asyncio.CancelledError doesn't catch it, so raise GuardrailFailed(guard_result) on line 607 is never reached.

Suggested change

try:

await model_task

except asyncio.CancelledError:

pass

try:

await model_task

except BaseException:

pass

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-10T01:08:12Z

+        guard_task: asyncio.Task[GuardrailResult] = asyncio.create_task(_call_async_guard(self.guard, ctx, messages))
+        model_task: asyncio.Task[ModelResponse] = asyncio.create_task(_call_model())
+
+        done: set[asyncio.Task[Any]] = set()
+        done, _ = await asyncio.wait(
+            {guard_task, model_task},
+            return_when=asyncio.FIRST_COMPLETED,
+        )
+
+        if guard_task in done:
+            guard_result = guard_task.result()
+            if not guard_result.passed:
+                model_task.cancel()
+                try:
+                    await model_task
+                except asyncio.CancelledError:
+                    pass
+                raise GuardrailFailed(guard_result)
+            # Guard passed; wait for model to finish
+            return await model_task
+
+        # Model finished first; still check the guard result
+        model_response: ModelResponse = model_task.result()
+        guard_result = await guard_task
+        if not guard_result.passed:
+            raise GuardrailFailed(guard_result)
+        return model_response


🔴 Concurrent mode: uncancelled task leak when guard or model raises an unexpected exception

In _run_concurrent, if guard_task.result() at line 600 re-raises an unexpected exception from the guard function, the model_task is left running uncancelled in the background. Similarly, if model_task.result() at line 612 raises (model finished first with an error), guard_task is never cancelled. This leaks asyncio tasks that continue consuming resources (e.g., an in-flight API call to the model provider) until they eventually complete.

Code flow

When the guard task finishes first with an exception (not a GuardrailResult, but an actual raised exception), guard_task.result() at src/pydantic_harness/guardrails.py:600 re-raises it. The exception propagates out of _run_concurrent without ever reaching model_task.cancel(). The model task continues running as a fire-and-forget background task.

Prompt for agents

The _run_concurrent method in src/pydantic_harness/guardrails.py creates two asyncio tasks (guard_task and model_task) but lacks a try/finally to ensure both are cleaned up in all code paths. When either task raises an unexpected exception, the other task is left running. The fix should wrap the logic after task creation in a try/finally that cancels any pending tasks on exit. For example: try: done, _ = await asyncio.wait(...) # existing logic finally: for task in (guard_task, model_task): if not task.done(): task.cancel() # Optionally await cancelled tasks to ensure cleanup This also addresses BUG-0001 since the BaseException catch in the finally would handle the model exception case. If fixing both together, the try/finally approach is cleaner than patching the except clause alone.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-10T01:08:13Z

+    async def _run_monitoring(
+        self,
+        ctx: RunContext[Any],
+        messages: list[ModelMessage],
+        request_context: ModelRequestContext,
+        handler: Callable[[ModelRequestContext], Awaitable[ModelResponse]],
+    ) -> ModelResponse:
+        """Run model first, then guard; log failures without raising."""
+        response = await handler(request_context)
+        result = await _call_async_guard(self.guard, ctx, messages)
+        if not result.passed:
+            logger.warning('AsyncGuardrail (monitoring): %s', result.reason)
+        return response


🚩 AsyncGuardrail monitoring mode checks request messages, not model response

The _run_monitoring method at src/pydantic_harness/guardrails.py:564-576 runs the guard after the model call, but passes messages (the request messages from request_context.messages) to the guard function — not the model's response. This means monitoring mode can only re-check the input messages after the fact, not inspect the model's output for issues. This may be intentional (the guard function signature only accepts list[ModelMessage]), but it limits the utility of monitoring mode compared to what the docstring suggests ('Run model first, then guard'). The PLAN.md doesn't clarify this distinction.

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM · 2026-04-10T15:08:43Z

Originally posted by @DouweM in #134 comment (PR was recreated)

Note: Advanced tripwire guardrails (run guard in parallel with LLM, cancel if guard fails) require the core primitive in #144 (handler cancellation contract for wrap_model_request). The guardrails in this PR use sequential before/after hooks.

DouweM · 2026-04-10T15:08:44Z

Originally posted by @DouweM in #134 comment (PR was recreated)

Audit vs prior art: Guardrails

Worth adding now:

AsyncGuardrail: parallel guard + LLM execution with early cancellation (OpenAI Agents SDK pattern). Run guard as concurrent task, cancel LLM if guard fails first.
Warning mode: action: Literal['block', 'warn'] = 'block' on InputGuardrail/OutputGuardrail
Pass full RunContext to guard function (not just str) for context-dependent decisions

Follow-up opportunities:

Content shields: PII, prompt injection, blocked keywords (harness Safety guardrail implementations: PII, Prompt Injection, Moderation (sub-issue of Guardrails) #47)
USD cost estimation via genai-prices
Composable guard chains

DouweM · 2026-04-10T15:08:45Z

Originally posted by @dsfaccini in #134 comment (PR was recreated)

Claude here: Summary of changes introduced in the latest push:

TypeGuard refactor

Replaced inspect.isawaitable + # type: ignore pattern with proper TypeGuard narrowing helpers (_is_awaitable_bool, _is_context_async_guard)
_call_async_guard simplified from 7 lines (3 type-ignores) to 2 lines (1 type-ignore)
Net reduction: 5 → 4 type-ignore comments (remaining ones are inherent to TypeGuard's negative-branch limitation)

Test coverage gaps filled (13 new tests, 72 → 85 total)

InputGuardrail: empty string input, non-string prompt conversion, multiple guardrails, async context_guard in warn mode
OutputGuardrail: multiple guardrails, async context_guard in warn mode
CostGuard: boundary tests (exactly-at-limit, zero-limit, single-limit-exceeded among multiple)
ToolGuard: tool in both blocked + require_approval lists
AsyncGuardrail: 2-arg guard in monitoring and concurrent modes
Exception propagation: guard raising RuntimeError propagates through agent.run()
Source coverage remains 100%

Example scripts (`examples/guardrails/`)

Five runnable examples, each with logfire.instrument_pydantic_ai() and labeled logfire.span() wrappers for trace identification:

prompt_injection.py — InputGuardrail with regex pattern detection
secret_leakage.py — OutputGuardrail checking for API key patterns
cost_budget.py — CostGuard with tight token budget triggering BudgetExceededError
tool_approval.py — ToolGuard with blocked tools + interactive approval callback
async_tripwire.py — AsyncGuardrail in concurrent mode with simulated content classifier

Infrastructure

Added examples dependency group (logfire, python-dotenv, pydantic-ai-slim[openai])
Configured pyright to exclude examples/ and .venv/
Added ruff per-file ignore for T20 (print statements) in examples

DouweM and others added 12 commits April 2, 2026 05:28

Add InputGuardrail test cases: empty string, non-string prompt, multi…

35338b7

…ple guardrails, async warn mode Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Add OutputGuardrail tests: multiple guardrails and async context_guar…

1d80ff3

…d warn mode Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Extend test coverage: CostGuard boundaries, ToolGuard overlap, AsyncG…

94b5f40

…uardrail context modes, exception propagation Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add examples dependency group and directory structure

ecdae1c

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Fix examples: add logfire instrumentation, fix usage call, use direct…

58dd10b

… OpenAI provider Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add logfire.span labels to example scripts for trace identification

246c3c2

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 01:02

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

DouweM assigned dsfaccini Apr 10, 2026

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:12

This was referenced Apr 17, 2026

Input/Output Guardrails capability #28

Open

Safety guardrail implementations: PII, Prompt Injection, Moderation (sub-issue of Guardrails) #47

Open

DouweM added this to the 2026-05 milestone Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard#182

Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard#182
DouweM wants to merge 12 commits intomainfrom
capability/guardrails

DouweM commented Apr 10, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DouweM commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: Guardrails

Uh oh!

DouweM commented Apr 10, 2026

TypeGuard refactor

Test coverage gaps filled (13 new tests, 72 → 85 total)

Example scripts (examples/guardrails/)

Infrastructure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DouweM commented Apr 10, 2026 •

edited

Loading

Example scripts (`examples/guardrails/`)