Skip to content

fatihkutlar/AgentMonitor

Repository files navigation

AgentMonitor

Production-ready observability platform for Python-based AI agents.

AgentMonitor helps engineering teams monitor agent behavior in production with a unified SDK, collector API, alert engine, TimescaleDB aggregates, and Grafana dashboards.

Highlights

  • Python SDK with @monitor_agent decorator and AgentMonitor context manager
  • Non-blocking metric export with in-memory queue (max 10k), retries, and backoff
  • FastAPI collector API (/ingest) with batch ingestion (up to 1000 events)
  • TimescaleDB hypertable + continuous aggregates (1min, 1hour)
  • PostgreSQL trace/rule/agent registry storage
  • Rule-based alerting every 30s (cost, latency spike, error rate, quality degradation)
  • Notification actions: Slack, Email, PagerDuty, Auto-disable
  • Auto-disable enforcement in SDK (AgentDisabledException)
  • Claude Haiku evaluator integration with Redis cache (24h)
  • Pre-provisioned Grafana dashboards

Repository Layout

agentmonitor/
├── sdk/
├── server/
├── storage/
├── dashboard/
├── grafana/
├── examples/
├── tests/
├── docs/
├── docker-compose.yml
├── requirements.txt
└── .env.example

Architecture

1) SDK (Instrumentation)

SDK emits one AgentEvent per agent call with:

  • identity: event_id, agent_name, run_id, tenant_id
  • timing: timestamp_start, timestamp_end, latency_ms
  • model economics: model, input_tokens, output_tokens, cost_usd
  • result state: status, error_type
  • quality: output_quality_score
  • context: metadata

2) Collector API

POST /ingest:

  • validates incoming events
  • writes metric rows to TimescaleDB (agent_metrics)
  • writes full traces to PostgreSQL (agent_traces)
  • schedules async alert evaluation

3) Alerting Engine

Runs every 30 seconds and evaluates enabled rules across recent windows.

Rule types:

  • cost_threshold
  • latency_spike
  • error_rate
  • quality_degradation

Actions:

  • slack
  • email
  • pagerduty
  • auto_disable

4) Storage

TimescaleDB:

  • hypertable: agent_metrics (1-day chunking)
  • continuous aggregates:
    • agent_metrics_1min
    • agent_metrics_1hour

PostgreSQL:

  • agent_traces
  • alert_rules
  • alerts
  • agent_registry

5) Dashboards

Grafana dashboards are provisioned automatically:

  • AgentMonitor Overview
  • AgentMonitor Agent Detail

Quick Start (Local)

Prerequisites

  • Docker + Docker Compose
  • Python 3.10+

1. Start stack

cd agentmonitor
docker compose up -d

Services:

  • API: http://localhost:8000
  • Grafana: http://localhost:3000 (admin/admin)
  • TimescaleDB: localhost:5432
  • Redis: localhost:6379

2. Install dependencies

python -m pip install -r requirements.txt

3. Configure SDK env

export AGENTMONITOR_URL=http://localhost:8000
export AGENTMONITOR_TENANT_ID=default
export AGENTMONITOR_API_KEY=local-dev-key

PowerShell:

$env:AGENTMONITOR_URL="http://localhost:8000"
$env:AGENTMONITOR_TENANT_ID="default"
$env:AGENTMONITOR_API_KEY="local-dev-key"

4. Emit sample events

python examples/basic_usage.py

5. Validate data

curl "http://localhost:8000/metrics/summary?tenant_id=default&window=24h"

SDK Usage

Decorator

from agentmonitor import configure, monitor_agent

configure(
    collector_url="http://localhost:8000",
    tenant_id="acme",
    api_key="local-dev-key",
)

@monitor_agent(name="email_classifier", tenant_id="acme", model="gpt-4o-mini")
def classify_email(email: str) -> dict:
    return {
        "label": "billing",
        "usage": {"prompt_tokens": 20, "completion_tokens": 9},
    }

Context manager

from agentmonitor import AgentMonitor

with AgentMonitor(name="research_agent", tenant_id="acme") as monitor:
    output = "result"
    monitor.set_token_counts(input_tokens=100, output_tokens=45)
    monitor.set_quality_score(0.91)
    monitor.set_input_output("query", output)

Agent disable handling

If an agent is disabled in registry, SDK raises AgentDisabledException before function execution.

API Endpoints

Collector:

  • POST /ingest

Rules:

  • POST /rules
  • GET /rules
  • PATCH /rules/{rule_id}

Metrics/Traces/Alerts:

  • GET /metrics/summary?tenant_id=&window=1h|24h|7d
  • GET /metrics/agent/{name}?tenant_id=&window=
  • GET /traces?tenant_id=&agent_name=&status=&limit=50
  • GET /alerts?tenant_id=&status=active|resolved

Agent control:

  • GET /agents/{name}/status?tenant_id=
  • POST /agents/{name}/enable?tenant_id=

Alert Rule Example

curl -X POST "http://localhost:8000/rules" \
  -H "Content-Type: application/json" \
  -H "x-agentmonitor-api-key: local-dev-key" \
  -d '{
    "tenant_id": "default",
    "rule_type": "cost_threshold",
    "agent_name": "email_classifier",
    "window_minutes": 5,
    "threshold": 0.000001,
    "action": "slack",
    "enabled": true
  }'

Quality Evaluator (Claude Haiku)

server/evaluators/llm_evaluator.py uses claude-haiku-4-5-20251001 and stores evaluation cache in Redis for 24 hours.

Set in .env:

  • AGENTMONITOR_ANTHROPIC_API_KEY
  • REDIS_URL

Testing

python -m pytest -q

Current suite:

  • tests/test_sdk.py
  • tests/test_alerting.py
  • tests/test_storage.py (skips if DB DSN not provided)

Grafana Provisioning

Auto-provisioned files:

  • grafana/provisioning/datasources/datasource.yml
  • grafana/provisioning/dashboards/dashboard.yml
  • grafana/dashboards/overview.json
  • grafana/dashboards/agent_detail.json

No manual Grafana setup is required.

Production Notes

  • Keep API key enabled in collector (AGENTMONITOR_API_KEY).
  • Run DB backups for agent_traces, alert_rules, alerts, and agent_registry.
  • Tune retention with AGENTMONITOR_RETENTION_DAYS.
  • SDK exporter is intentionally non-blocking; monitor queue pressure if collector is unavailable.

Additional Docs

  • docs/quickstart.md
  • examples/basic_usage.py
  • examples/langchain_integration.py
  • examples/openai_integration.py

About

Production-ready AI agent observability platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors