Skip to content

Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard#182

Open
DouweM wants to merge 12 commits intomainfrom
capability/guardrails
Open

Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard#182
DouweM wants to merge 12 commits intomainfrom
capability/guardrails

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented Apr 10, 2026

Summary

  • InputGuardrail: validates user input before the run starts via before_run hook, with a user-supplied sync/async guard function
  • OutputGuardrail: validates model output after the run completes via after_run hook, with a user-supplied sync/async guard function
  • CostGuard: enforces token budget limits (input, output, total) before each model request via before_model_request hook
  • ToolGuard: blocks tools from the model via prepare_tools and requires approval before execution via before_tool_execute hook

All four are AbstractCapability subclasses following pydantic-ai patterns. Exception hierarchy: GuardrailError base with InputBlocked, OutputBlocked, BudgetExceededError, ToolBlocked subclasses.

Design inspired by vstorm-co/pydantic-ai-shields and OpenAI Agents SDK guardrails.

See PLAN.md for design decisions and future work.

Closes #28

Test plan

  • 38 tests covering all capabilities, exception hierarchy, sync/async guards, composition, and imports
  • ruff check and ruff format pass
  • pyright strict mode passes on src/ and tests/
  • CI passes

🤖 Generated with Claude Code

DouweM and others added 12 commits April 2, 2026 05:28
…d, ToolGuard

Implement four AbstractCapability subclasses for common safety and cost-control
concerns, with 38 passing tests covering all code paths:

- InputGuardrail: validates user input via before_run hook
- OutputGuardrail: validates model output via after_run hook
- CostGuard: enforces token budget limits via before_model_request hook
- ToolGuard: blocks tools (prepare_tools) and requires approval (before_tool_execute)

Closes #28

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…eve 100% coverage

- Keep main's __init__.py structure (docstring, comments) + add guardrail exports alphabetically
- Use main's pyproject.toml (pydantic-ai-slim>=1.76.0 from PyPI, no git sources)
- Restrict guardrails tests to asyncio backend (pydantic-ai uses asyncio.gather internally)
- Add test for None prompt path in InputGuardrail.before_run
- Add pragma: no cover for intentionally-uncalled guard functions in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add AsyncGuardrail that uses wrap_model_request to run a guard function
alongside model calls. Three execution modes: concurrent (guard + model
in parallel, cancel model on guard failure), blocking (guard before
model), and monitoring (guard after model, log-only).

Also adds warn mode and context_guard (RunContext access) to
InputGuardrail and OutputGuardrail, plus GuardrailResult data class
and GuardrailFailed exception.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…rst path

- Add tests for async context_guard (covers _call_context_guard await path)
- Replace flaky concurrent test with deterministic sleep-based tests that
  reliably exercise the model-finishes-first branch (lines 602-606)
- Add both pass and fail variants of model-finishes-first scenario

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Adds _is_awaitable_bool and _is_context_async_guard TypeGuard functions
to replace the scattered inspect.isawaitable calls, reducing # type: ignore
comments from 5 to 4 (3 for the unavoidable negative-branch narrowing
limitation + 1 for the 1-arg async guard branch).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ple guardrails, async warn mode

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…uardrail context modes, exception propagation

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Five runnable examples: prompt injection detection, secret leakage
prevention, cost budget enforcement, tool approval workflow, and
async tripwire guardrail.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… OpenAI provider

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +603 to +606
try:
await model_task
except asyncio.CancelledError:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Concurrent mode: model exception masks guard failure when both tasks complete

In _run_concurrent, when the guard task returns GuardrailResult(passed=False) and the model task has also completed with an exception (e.g., network error, timeout, rate limit), the code at line 604 does await model_task after model_task.cancel(). Since the model task already completed, cancel() is a no-op, and await model_task re-raises the model's exception. The except asyncio.CancelledError on line 605 does not catch non-cancellation exceptions, so the model's exception propagates instead of the intended GuardrailFailed. This means a security-relevant guard failure is silently replaced by a model error.

Reproduction scenario

Both tasks finish before asyncio.wait returns. Guard returns passed=False. Model raises RuntimeError. The except asyncio.CancelledError doesn't catch it, so raise GuardrailFailed(guard_result) on line 607 is never reached.

Suggested change
try:
await model_task
except asyncio.CancelledError:
pass
try:
await model_task
except BaseException:
pass
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +590 to +616
guard_task: asyncio.Task[GuardrailResult] = asyncio.create_task(_call_async_guard(self.guard, ctx, messages))
model_task: asyncio.Task[ModelResponse] = asyncio.create_task(_call_model())

done: set[asyncio.Task[Any]] = set()
done, _ = await asyncio.wait(
{guard_task, model_task},
return_when=asyncio.FIRST_COMPLETED,
)

if guard_task in done:
guard_result = guard_task.result()
if not guard_result.passed:
model_task.cancel()
try:
await model_task
except asyncio.CancelledError:
pass
raise GuardrailFailed(guard_result)
# Guard passed; wait for model to finish
return await model_task

# Model finished first; still check the guard result
model_response: ModelResponse = model_task.result()
guard_result = await guard_task
if not guard_result.passed:
raise GuardrailFailed(guard_result)
return model_response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Concurrent mode: uncancelled task leak when guard or model raises an unexpected exception

In _run_concurrent, if guard_task.result() at line 600 re-raises an unexpected exception from the guard function, the model_task is left running uncancelled in the background. Similarly, if model_task.result() at line 612 raises (model finished first with an error), guard_task is never cancelled. This leaks asyncio tasks that continue consuming resources (e.g., an in-flight API call to the model provider) until they eventually complete.

Code flow

When the guard task finishes first with an exception (not a GuardrailResult, but an actual raised exception), guard_task.result() at src/pydantic_harness/guardrails.py:600 re-raises it. The exception propagates out of _run_concurrent without ever reaching model_task.cancel(). The model task continues running as a fire-and-forget background task.

Prompt for agents
The _run_concurrent method in src/pydantic_harness/guardrails.py creates two asyncio tasks (guard_task and model_task) but lacks a try/finally to ensure both are cleaned up in all code paths. When either task raises an unexpected exception, the other task is left running.

The fix should wrap the logic after task creation in a try/finally that cancels any pending tasks on exit. For example:

try:
    done, _ = await asyncio.wait(...)
    # existing logic
finally:
    for task in (guard_task, model_task):
        if not task.done():
            task.cancel()
    # Optionally await cancelled tasks to ensure cleanup

This also addresses BUG-0001 since the BaseException catch in the finally would handle the model exception case. If fixing both together, the try/finally approach is cleaner than patching the except clause alone.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +564 to +576
async def _run_monitoring(
self,
ctx: RunContext[Any],
messages: list[ModelMessage],
request_context: ModelRequestContext,
handler: Callable[[ModelRequestContext], Awaitable[ModelResponse]],
) -> ModelResponse:
"""Run model first, then guard; log failures without raising."""
response = await handler(request_context)
result = await _call_async_guard(self.guard, ctx, messages)
if not result.passed:
logger.warning('AsyncGuardrail (monitoring): %s', result.reason)
return response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 AsyncGuardrail monitoring mode checks request messages, not model response

The _run_monitoring method at src/pydantic_harness/guardrails.py:564-576 runs the guard after the model call, but passes messages (the request messages from request_context.messages) to the guard function — not the model's response. This means monitoring mode can only re-check the input messages after the fact, not inspect the model's output for issues. This may be intentional (the guard function signature only accepts list[ModelMessage]), but it limits the utility of monitoring mode compared to what the docstring suggests ('Run model first, then guard'). The PLAN.md doesn't clarify this distinction.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #134 comment (PR was recreated)

Note: Advanced tripwire guardrails (run guard in parallel with LLM, cancel if guard fails) require the core primitive in #144 (handler cancellation contract for wrap_model_request). The guardrails in this PR use sequential before/after hooks.

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #134 comment (PR was recreated)

Audit vs prior art: Guardrails

Worth adding now:

  • AsyncGuardrail: parallel guard + LLM execution with early cancellation (OpenAI Agents SDK pattern). Run guard as concurrent task, cancel LLM if guard fails first.
  • Warning mode: action: Literal['block', 'warn'] = 'block' on InputGuardrail/OutputGuardrail
  • Pass full RunContext to guard function (not just str) for context-dependent decisions

Follow-up opportunities:

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @dsfaccini in #134 comment (PR was recreated)

Claude here: Summary of changes introduced in the latest push:

TypeGuard refactor

  • Replaced inspect.isawaitable + # type: ignore pattern with proper TypeGuard narrowing helpers (_is_awaitable_bool, _is_context_async_guard)
  • _call_async_guard simplified from 7 lines (3 type-ignores) to 2 lines (1 type-ignore)
  • Net reduction: 5 → 4 type-ignore comments (remaining ones are inherent to TypeGuard's negative-branch limitation)

Test coverage gaps filled (13 new tests, 72 → 85 total)

  • InputGuardrail: empty string input, non-string prompt conversion, multiple guardrails, async context_guard in warn mode
  • OutputGuardrail: multiple guardrails, async context_guard in warn mode
  • CostGuard: boundary tests (exactly-at-limit, zero-limit, single-limit-exceeded among multiple)
  • ToolGuard: tool in both blocked + require_approval lists
  • AsyncGuardrail: 2-arg guard in monitoring and concurrent modes
  • Exception propagation: guard raising RuntimeError propagates through agent.run()
  • Source coverage remains 100%

Example scripts (examples/guardrails/)

Five runnable examples, each with logfire.instrument_pydantic_ai() and labeled logfire.span() wrappers for trace identification:

  1. prompt_injection.py — InputGuardrail with regex pattern detection
  2. secret_leakage.py — OutputGuardrail checking for API key patterns
  3. cost_budget.py — CostGuard with tight token budget triggering BudgetExceededError
  4. tool_approval.py — ToolGuard with blocked tools + interactive approval callback
  5. async_tripwire.py — AsyncGuardrail in concurrent mode with simulated content classifier

Infrastructure

  • Added examples dependency group (logfire, python-dotenv, pydantic-ai-slim[openai])
  • Configured pyright to exclude examples/ and .venv/
  • Added ruff per-file ignore for T20 (print statements) in examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Input/Output Guardrails capability

2 participants