Skip to content

feat: add input and output guardrails#219

Open
DEENUU1 wants to merge 9 commits intopydantic:mainfrom
vstorm-co:vstorm/guardrails
Open

feat: add input and output guardrails#219
DEENUU1 wants to merge 9 commits intopydantic:mainfrom
vstorm-co:vstorm/guardrails

Conversation

@DEENUU1
Copy link
Copy Markdown

@DEENUU1 DEENUU1 commented Apr 24, 2026

Summary

Adds two guardrail capabilities with a minimal, callable-based API:

  • InputGuard(guard, parallel=False, block_message=...) — runs before each model request. A guard that returns False triggers a graceful refusal via SkipModelRequest (a canned ModelResponse becomes the step output); a guard that raises propagates as a hard failure.
  • OutputGuard(guard, block_message=...) — runs in after_run against the final agent output. A guard that returns False raises OutputBlocked.

Both accept sync or async callables. parallel=True on InputGuard races the guard against the model call via wrap_model_request and cancels the handler as soon as the guard trips, saving tokens when the guard is slower than the provider round-trip.

Design notes:

  • Asymmetric halt semantics are intentional: input fail → graceful (no tokens spent, conversation continues), output fail → exception (tokens already spent, caller must decide).
  • parallel is a flag on InputGuard rather than a separate AsyncGuardrail wrapper — keeps the API surface small and matches @Josh-blythe's feedback on Input/Output Guardrails capability #28 (fast guards don't need the cancellation wiring; let the capability declare whether it needs parallel).
  • Cancellation on trip is always on in parallel mode; it's a fail-fast primitive and a configurable option was redundant.

Out of scope for this PR (tracked as follow-ups):

  • Tool guardrails — belong in the tool-approval layer (Expand hooks: before_tool_call / after_tool_call on AbstractCapability #19).
  • transform / warn actions — MVP is halt-only.
  • Per-step output validation and per-token streaming validation — after_run + wrap_run_event_stream are the follow-up hooks.
  • Content shields (prompt-injection detectors, PII scrubbers, etc.) — stay in pydantic-ai-shields on top of these primitives.

Files added:

pydantic_ai_harness/guardrails/{__init__,_capability,_exceptions}.py
pydantic_ai_harness/guardrails/README.md
tests/_guardrails/{__init__,test_input_guard,test_output_guard}.py

Top-level pydantic_ai_harness/__init__.py updated to export InputGuard, OutputGuard, GuardrailFunc, GuardrailError, InputBlocked, OutputBlocked.

Linked Issue

Fixes #28

Checklist

  • Linked issue exists and is referenced above
  • Tests added/updated for new behavior
  • make lint && make typecheck && make test passes locally (don't stress about CI -- we'll help)
  • No changes to pyproject.toml or uv.lock (dependency changes require a separate issue)
  • Docstrings use single backticks (not RST double backticks)

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread pydantic_ai_harness/guardrails/_capability.py Outdated
Comment thread pydantic_ai_harness/guardrails/_capability.py Outdated
try:
done, _ = await asyncio.wait(
[guard_task, handler_task],
return_when=asyncio.FIRST_COMPLETED,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify it :)

Suggested change
return_when=asyncio.FIRST_COMPLETED,
try:
done, _ = await asyncio.wait(
{guard_task, handler_task},
return_when=asyncio.FIRST_COMPLETED,
)
if guard_task in done:
await guard_task
return await handler_task
response = await handler_task
await guard_task
return response
finally:
for task in (guard_task, handler_task):
if not task.done():
task.cancel()
await asyncio.gather(guard_task, handler_task, return_exceptions=True)

Sorry for the ugly snippet wrote in GitHub but hope it makes sense

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back to this because I was thinking if we are actually using this in a semantically correct way.

We do return_when=asyncio.FIRST_COMPLETED but we don't use it in a way warranting it. We always have to wait for the sibling task to be over, aren't we better off doing a group in that case?

Might lose the exceptions but we can unwrap it I think, worth checking for sure I'd love for it to be as readable as possible.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept wait(FIRST_COMPLETED) though. asyncio.gather(return_exceptions=True) waits for all.

TaskGroup would be ideal but it's 3.11+. So wait(FIRST_COMPLETED) is the simplest 3.10-compatible form that still gives us fail-fast. The "wait for sibling" only kicks in on the success path.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the anyio equivalent? Is that also 3.11+?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in cancel paths we would need to cancel the sibling?

If guard fails then handler needs to stop

If handler fails guard has nothing to do

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC told me that:

Tradeoff is it wraps exceptions in ExceptionGroup, so a SkipModelRequest from the guard comes out as ExceptionGroup([SkipModelRequest]) which _agent_graph.py's bare except SkipModelRequest won't catch. We'd have to manually unwrap before re-raising (no except* on 3.10 either). Net more code than the current wait + finally.

We would have to check it out

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense, except* would have been grand, the snippet could have collapsed to a fraction.

We should check if we can reasonably unwrap correctly otherwise this is fine with me

async def run_handler() -> ModelResponse:
return await handler(request_context)

guard_task: asyncio.Task[None] = asyncio.create_task(self._run_guard(prompt))
Copy link
Copy Markdown
Member

@adtyavrdhn adtyavrdhn Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am on the fence thinking if we should call it PromptGuard because we don't run this for subsequent model requests and only for the first user prompt step, what do you think?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks clear, considering that we also have OutputBlocked, and it's similar to what exists in the OpenAI Agents SDK https://github.com/openai/openai-agents-python/blob/9a207b6938699d87d2d17dd67dd628ca3af0232d/src/agents/guardrail.py#L72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Input/Output Guardrails capability

2 participants