Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard#182
Add guardrail capabilities: InputGuardrail, OutputGuardrail, CostGuard, ToolGuard#182
Conversation
…d, ToolGuard Implement four AbstractCapability subclasses for common safety and cost-control concerns, with 38 passing tests covering all code paths: - InputGuardrail: validates user input via before_run hook - OutputGuardrail: validates model output via after_run hook - CostGuard: enforces token budget limits via before_model_request hook - ToolGuard: blocks tools (prepare_tools) and requires approval (before_tool_execute) Closes #28 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…eve 100% coverage - Keep main's __init__.py structure (docstring, comments) + add guardrail exports alphabetically - Use main's pyproject.toml (pydantic-ai-slim>=1.76.0 from PyPI, no git sources) - Restrict guardrails tests to asyncio backend (pydantic-ai uses asyncio.gather internally) - Add test for None prompt path in InputGuardrail.before_run - Add pragma: no cover for intentionally-uncalled guard functions in tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add AsyncGuardrail that uses wrap_model_request to run a guard function alongside model calls. Three execution modes: concurrent (guard + model in parallel, cancel model on guard failure), blocking (guard before model), and monitoring (guard after model, log-only). Also adds warn mode and context_guard (RunContext access) to InputGuardrail and OutputGuardrail, plus GuardrailResult data class and GuardrailFailed exception. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…rst path - Add tests for async context_guard (covers _call_context_guard await path) - Replace flaky concurrent test with deterministic sleep-based tests that reliably exercise the model-finishes-first branch (lines 602-606) - Add both pass and fail variants of model-finishes-first scenario Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Adds _is_awaitable_bool and _is_context_async_guard TypeGuard functions to replace the scattered inspect.isawaitable calls, reducing # type: ignore comments from 5 to 4 (3 for the unavoidable negative-branch narrowing limitation + 1 for the 1-arg async guard branch). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ple guardrails, async warn mode Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…d warn mode Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…uardrail context modes, exception propagation Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Five runnable examples: prompt injection detection, secret leakage prevention, cost budget enforcement, tool approval workflow, and async tripwire guardrail. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… OpenAI provider Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| try: | ||
| await model_task | ||
| except asyncio.CancelledError: | ||
| pass |
There was a problem hiding this comment.
🔴 Concurrent mode: model exception masks guard failure when both tasks complete
In _run_concurrent, when the guard task returns GuardrailResult(passed=False) and the model task has also completed with an exception (e.g., network error, timeout, rate limit), the code at line 604 does await model_task after model_task.cancel(). Since the model task already completed, cancel() is a no-op, and await model_task re-raises the model's exception. The except asyncio.CancelledError on line 605 does not catch non-cancellation exceptions, so the model's exception propagates instead of the intended GuardrailFailed. This means a security-relevant guard failure is silently replaced by a model error.
Reproduction scenario
Both tasks finish before asyncio.wait returns. Guard returns passed=False. Model raises RuntimeError. The except asyncio.CancelledError doesn't catch it, so raise GuardrailFailed(guard_result) on line 607 is never reached.
| try: | |
| await model_task | |
| except asyncio.CancelledError: | |
| pass | |
| try: | |
| await model_task | |
| except BaseException: | |
| pass |
Was this helpful? React with 👍 or 👎 to provide feedback.
| guard_task: asyncio.Task[GuardrailResult] = asyncio.create_task(_call_async_guard(self.guard, ctx, messages)) | ||
| model_task: asyncio.Task[ModelResponse] = asyncio.create_task(_call_model()) | ||
|
|
||
| done: set[asyncio.Task[Any]] = set() | ||
| done, _ = await asyncio.wait( | ||
| {guard_task, model_task}, | ||
| return_when=asyncio.FIRST_COMPLETED, | ||
| ) | ||
|
|
||
| if guard_task in done: | ||
| guard_result = guard_task.result() | ||
| if not guard_result.passed: | ||
| model_task.cancel() | ||
| try: | ||
| await model_task | ||
| except asyncio.CancelledError: | ||
| pass | ||
| raise GuardrailFailed(guard_result) | ||
| # Guard passed; wait for model to finish | ||
| return await model_task | ||
|
|
||
| # Model finished first; still check the guard result | ||
| model_response: ModelResponse = model_task.result() | ||
| guard_result = await guard_task | ||
| if not guard_result.passed: | ||
| raise GuardrailFailed(guard_result) | ||
| return model_response |
There was a problem hiding this comment.
🔴 Concurrent mode: uncancelled task leak when guard or model raises an unexpected exception
In _run_concurrent, if guard_task.result() at line 600 re-raises an unexpected exception from the guard function, the model_task is left running uncancelled in the background. Similarly, if model_task.result() at line 612 raises (model finished first with an error), guard_task is never cancelled. This leaks asyncio tasks that continue consuming resources (e.g., an in-flight API call to the model provider) until they eventually complete.
Code flow
When the guard task finishes first with an exception (not a GuardrailResult, but an actual raised exception), guard_task.result() at src/pydantic_harness/guardrails.py:600 re-raises it. The exception propagates out of _run_concurrent without ever reaching model_task.cancel(). The model task continues running as a fire-and-forget background task.
Prompt for agents
The _run_concurrent method in src/pydantic_harness/guardrails.py creates two asyncio tasks (guard_task and model_task) but lacks a try/finally to ensure both are cleaned up in all code paths. When either task raises an unexpected exception, the other task is left running.
The fix should wrap the logic after task creation in a try/finally that cancels any pending tasks on exit. For example:
try:
done, _ = await asyncio.wait(...)
# existing logic
finally:
for task in (guard_task, model_task):
if not task.done():
task.cancel()
# Optionally await cancelled tasks to ensure cleanup
This also addresses BUG-0001 since the BaseException catch in the finally would handle the model exception case. If fixing both together, the try/finally approach is cleaner than patching the except clause alone.
Was this helpful? React with 👍 or 👎 to provide feedback.
| async def _run_monitoring( | ||
| self, | ||
| ctx: RunContext[Any], | ||
| messages: list[ModelMessage], | ||
| request_context: ModelRequestContext, | ||
| handler: Callable[[ModelRequestContext], Awaitable[ModelResponse]], | ||
| ) -> ModelResponse: | ||
| """Run model first, then guard; log failures without raising.""" | ||
| response = await handler(request_context) | ||
| result = await _call_async_guard(self.guard, ctx, messages) | ||
| if not result.passed: | ||
| logger.warning('AsyncGuardrail (monitoring): %s', result.reason) | ||
| return response |
There was a problem hiding this comment.
🚩 AsyncGuardrail monitoring mode checks request messages, not model response
The _run_monitoring method at src/pydantic_harness/guardrails.py:564-576 runs the guard after the model call, but passes messages (the request messages from request_context.messages) to the guard function — not the model's response. This means monitoring mode can only re-check the input messages after the fact, not inspect the model's output for issues. This may be intentional (the guard function signature only accepts list[ModelMessage]), but it limits the utility of monitoring mode compared to what the docstring suggests ('Run model first, then guard'). The PLAN.md doesn't clarify this distinction.
Was this helpful? React with 👍 or 👎 to provide feedback.
Note: Advanced tripwire guardrails (run guard in parallel with LLM, cancel if guard fails) require the core primitive in #144 (handler cancellation contract for wrap_model_request). The guardrails in this PR use sequential before/after hooks. |
Audit vs prior art: GuardrailsWorth adding now:
Follow-up opportunities:
|
Claude here: Summary of changes introduced in the latest push: TypeGuard refactor
Test coverage gaps filled (13 new tests, 72 → 85 total)
Example scripts (
|
Summary
before_runhook, with a user-supplied sync/async guard functionafter_runhook, with a user-supplied sync/async guard functionbefore_model_requesthookprepare_toolsand requires approval before execution viabefore_tool_executehookAll four are
AbstractCapabilitysubclasses following pydantic-ai patterns. Exception hierarchy:GuardrailErrorbase withInputBlocked,OutputBlocked,BudgetExceededError,ToolBlockedsubclasses.Design inspired by vstorm-co/pydantic-ai-shields and OpenAI Agents SDK guardrails.
See
PLAN.mdfor design decisions and future work.Closes #28
Test plan
ruff checkandruff formatpasspyrightstrict mode passes on src/ and tests/🤖 Generated with Claude Code