CortexOps

Reliability infrastructure for AI agents.
Evaluate · Observe · Operate — for LangGraph, CrewAI, and AutoGen.

Website · PyPI · Docs

The problem

You deployed an agent. You have no idea if it regressed overnight.

No standard eval format. No failure traces. No CI gate before the next prompt change ships.
CortexOps fixes that.

Quickstart

pip install cortexops

from cortexops import CortexTracer, EvalSuite

# Wrap your LangGraph app — zero refactor required
tracer = CortexTracer(project="payments-agent")
graph  = tracer.wrap(your_langgraph_app)

# Run evaluations against a golden dataset
results = EvalSuite.run(
    dataset="golden_v1.yaml",
    agent=graph,
)

print(results.summary())
# CortexOps eval — payments-agent
#   Cases           : 9  (7 passed, 2 failed)
#   Task completion : 91.4%
#   Tool accuracy   : 97.0/100
#   Latency p50/p95 : 42ms / 187ms
#   Failed cases:
#     - escalation_router: tool_call_mismatch (score 41)

Hosted API (Pro)

The free tier runs evals locally. The Pro tier adds hosted trace storage, a live dashboard, Slack alerts, and LLM-as-judge scoring.

Get an API key at getcortexops.com

Then point the SDK at the hosted API:

from cortexops import CortexTracer, EvalSuite

tracer = CortexTracer(
    project="my-agent",
    api_key="cxo-your-key-here",
    api_url="https://api.getcortexops.com"
)

graph = tracer.wrap(your_langgraph_app)

Traces are stored for 90 days. View them at api.getcortexops.com/docs.

Pricing

Tier	Price	What you get
Free	$0	Local evals, CLI, GitHub Actions gate
Pro	$49/seat/mo	Hosted traces, Slack alerts, dashboard, LLM judge
Enterprise	Custom	VPC, SSO, SLA

Start a Pro trial: getcortexops.com/#pricing

Golden dataset format

Define test cases in YAML. Run them locally or in CI.

# golden_v1.yaml
version: 1
project: payments-agent

cases:
  - id: refund_lookup_01
    input: "What is the status of refund REF-8821?"
    expected_tool_calls: [lookup_refund]
    expected_output_contains: ["approved", "REF-8821"]
    max_latency_ms: 3000

  - id: dispute_escalation_01
    input: "I was charged twice — this is unauthorized"
    expected_tool_calls: [classify_dispute, route_escalation]
    expected_output_contains: ["escalated"]
    max_latency_ms: 5000

CI eval gate

Add to .github/workflows/eval.yml:

- name: CortexOps eval gate
  run: |
    python examples/langgraph_payments/run_eval.py \
      --dataset golden_v1.yaml \
      --fail-on "task_completion < 0.90"

If the eval drops below threshold, the job exits non-zero and the PR is blocked.

Repo structure

cortexops/
├── sdk/                        # pip install cortexops
│   ├── cortexops/
│   │   ├── tracer.py           # CortexTracer — wraps LangGraph / CrewAI
│   │   ├── eval.py             # EvalSuite — golden dataset runner
│   │   ├── metrics.py          # task_completion, tool_accuracy, latency, hallucination
│   │   ├── models.py           # Pydantic data models
│   │   └── client.py           # HTTP client for hosted API
│   └── tests/
├── backend/                    # FastAPI + Celery + SQLite/Postgres
│   ├── app/
│   │   ├── main.py
│   │   ├── routers/            # /v1/evals, /v1/traces
│   │   ├── models/             # DB records + API schemas
│   │   └── worker/             # Celery async eval tasks
│   └── Dockerfile
├── frontend/                   # React + TypeScript dashboard
├── examples/
│   └── langgraph_payments/     # Full runnable demo
│       ├── agent.py
│       ├── golden_v1.yaml
│       └── run_eval.py
└── docker-compose.yml

Run the full stack locally

git clone https://github.com/ashishodu2023/cortexops
cd cortexops

# Start API + worker + Redis
docker compose up --build

# In another terminal — run the demo eval
cd examples/langgraph_payments
pip install -e ../../sdk/
python run_eval.py

# API docs at http://localhost:8000/docs
# Dashboard at http://localhost:3000

Supported frameworks

Framework	Status
LangGraph	Stable
CrewAI	Stable
AutoGen	Beta
LlamaIndex agents	Coming soon
Custom callables	Supported via `CortexTracer.wrap()`

Built-in metrics

Metric	What it checks
`task_completion`	Agent produced a valid, non-error output
`tool_accuracy`	Expected tool calls were actually made
`latency`	Response within `max_latency_ms` budget
`hallucination`	Detects fabrication signals in output

Add custom metrics by subclassing cortexops.Metric.

Contributing

git clone https://github.com/ashishodu2023/cortexops
cd cortexops/sdk
pip install -e ".[dev]"
pytest tests/ -v

See CONTRIBUTING.md. Issues labeled good first issue are a great place to start.

Citation

@software{cortexops2025,
  author  = {Ashish, et al.},
  title   = {CortexOps: Reliability Infrastructure for AI Agents},
  year    = {2025},
  url     = {https://github.com/ashishodu2023/cortexops},
}

License

MIT — see LICENSE.

cortexops.ai · Issues · Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github		.github
backend		backend
docs-site		docs-site
examples		examples
frontend-new		frontend-new
masev		masev
sdk		sdk
website		website
.gitignore		.gitignore
.semgrepignore		.semgrepignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
railway.toml		railway.toml
ruff.toml		ruff.toml
start.py		start.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CortexOps

Website · PyPI · Docs

The problem

Quickstart

Hosted API (Pro)

Pricing

Golden dataset format

CI eval gate

Repo structure

Run the full stack locally

Supported frameworks

Built-in metrics

Contributing

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CortexOps

Website · PyPI · Docs

The problem

Quickstart

Hosted API (Pro)

Pricing

Golden dataset format

CI eval gate

Repo structure

Run the full stack locally

Supported frameworks

Built-in metrics

Contributing

Citation

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages