A Python toolkit for analyzing AI agent trajectories against policy constraints. The toolkit generates policy guards from policy documents, creates history-aware wrappers, and analyzes agent interactions to detect policy violations.
This repository demonstrates the methodology described in:
Near-Miss: Latent Policy Failure Detection in Agentic Workflows Ella Rabinovich, David Boaz, Naama Zwerdling, Ateret Anaby-Tavor arXiv preprint arXiv:2603.29665, 2026 https://arxiv.org/abs/2603.29665
@article{rabinovich2026nearmiss,
title={Near-Miss: Latent Policy Failure Detection in Agentic Workflows},
author={Rabinovich, Ella and Boaz, David and Zwerdling, Naama and Anaby-Tavor, Ateret},
journal={arXiv preprint arXiv:2603.29665},
year={2026}
}Policy Violation Analysis provides three main tools that work together in a sequential workflow:
- Guards Generator - Generates policy guard code from policy documents and OpenAPI specifications
- Wrapper Generator - Creates history-aware wrappers that integrate with the generated guards
- Trajectory Analyzer - Analyzes AI agent trajectories against the guards and generates detailed reports
With uv (recommended):
git clone https://github.com/AgentToolkit/toolguard-violation-analysis.git
cd toolguard-violation-analysis
uv sync --extra devWith pip:
git clone https://github.com/AgentToolkit/toolguard-violation-analysis.git
cd toolguard-violation-analysis
pip install -e ".[dev]"The typical workflow involves three steps:
See the file .env.examples rename it to .env and update values
Generate guard code from your policy document and API specification:
uv run python -m policy_violation.guards \
--policy-path demos/tau2_airline/policy.md \
--oas-path demos/tau2_airline/open_api.json \
--output-dir generated \
--app-name apiThis creates guard implementations that enforce your policy constraints.
For detailed options and examples, see Guards CLI Documentation
Create a history-aware wrapper that uses the generated guards:
uv run python -m policy_violation.wrappers \
--guards-dir generated \
--wrapper-file generated/api_wrapper.py \
--class-name "ApiWrapper"The wrapper checks conversation history before making API calls, reducing unnecessary calls and improving efficiency.
For detailed options and examples, see Wrappers CLI Documentation
Analyze AI agent trajectories against your policy guards:
uv run python -m policy_violation.analysis \
demos/tau2_airline/traj_claude-4-sonnet.json \
--guards-dir generated \
--results-dir results \
--wrapper-file generated/api_wrapper.pyThis generates a CSV report with detailed analysis of policy violations and guard evaluations.
For detailed options and examples, see Analyze CLI Documentation
Here's a complete end-to-end example:
# 1. Generate guards from policy
uv run python -m policy_violation.guards \
--policy-path demos/tau2_airline/policy.md \
--oas-path demos/tau2_airline/open_api.json \
--output-dir generated \
--app-name api \
--verbose
# 2. Generate wrapper
uv run python -m policy_violation.wrappers \
--guards-dir generated \
--wrapper-file generated/api_wrapper.py \
--class-name "ApiWrapper" \
--verbose
# 3. Analyze trajectories
uv run python -m policy_violation.analysis \
demos/tau2_airline/traj_claude-4-sonnet.json \
--guards-dir generated \
--results-dir results \
--wrapper-file generated/api_wrapper.py \
--verboseCreate a .env file in the project root with your LLM configuration:
# Example
MODEL_NAME="claude-sonnet-4-6"
LLM_PROVIDER="openai"
LLM_API_KEY="..."
LLM_API_BASE="https://litellm.something.com"See .env.example for a template.
For detailed documentation on each tool:
- Guards CLI - Policy guard generation options and examples
- Wrappers CLI - Wrapper generation options and examples
- Analyze CLI - Trajectory analysis options and examples
The project includes VS Code launch configurations for debugging. See .vscode/launch.json for:
- Gen Guards - Debug guard generation
- Gen Wrapper - Debug wrapper generation
- Analyze - Debug trajectory analysis
- Python 3.10+
- LLM API access (OpenAI, Anthropic, or compatible provider)
- Policy document in Markdown format
- OpenAPI specification in JSON format
- Trajectory data in canonical JSON format
Licensed under the terms in LICENSE.