ClinicOps Copilot

Multi-agent AI system for clinic operations over a FHIR R4 database, with first-class observability and a deployable CLI.

What This Is

ClinicOps Copilot is a working multi-agent system for clinic operations. It operates over a synthetic FHIR R4 patient database and handles real edge cases — new-patient intake, booking conflicts, expired coverage, code-switched Spanish intents. Every tool call is traced to a Streamlit observability dashboard.

The patient always talks to a single ClinicOps Assistant. Behind the scenes, the assistant delegates specialized work to sub-agents (Onboarding, Scheduler, Eligibility) via delegate_to_<name> tool calls — an agents-as-tools architecture where sub-agents are invoked as stateless subroutines and the handoff is invisible to the user.

Sub-agents live in an agent registry. Built-ins register at startup, and new workflows are added by dropping a single .py file into plugins/ — the assistant automatically picks it up as another delegate tool. A single CLI (clinicops) handles seeding, interactive chat, dashboard, and evals.

Quick Start

# Clone
git clone https://github.com/deepmind11/clinic-ops-copilot.git
cd clinic-ops-copilot

# Install
uv sync --all-extras

# Configure — copy the example and fill in your API key
cp .env.example .env
# Edit .env and set LLM_API_KEY (see .env.example for provider options)

# Start Postgres + load synthetic data
docker compose up -d
uv run clinicops seed --patients 1000

# Start an interactive session
uv run clinicops

# Open the observability dashboard
uv run clinicops dashboard

# Run the eval harness
uv run clinicops eval

See docs/QUICKSTART.md for the full walkthrough.

Architecture

  +-------------------------------------+
  |       CLI (clinicops)               |   REPL, seed, eval, dashboard, logs
  |  streaming output · prompt history  |
  +------------------+------------------+
                     |
                     v
  +-------------------------------------+
  |       ClinicOps Assistant           |   single user-facing agent
  |          (master / triage)          |   owns conversation state
  +------------------+------------------+
                     |  delegate_to_<name>(request)  (internal tool calls)
     +---------------+---------------+
     |               |               |
     v               v               v
  +-------+      +---------+      +-----------+
  |Onboard|      |Scheduler|      |Eligibility|       each is a sub-agent
  |  ing  |      |         |      |           |       with its own tool
  +---+---+      +----+----+      +-----+-----+       loop + system prompt
      |               |                 |
      +---------------+-----------------+
                      |
                      v  (raw SQL via psycopg)
          +-----------------------+       +--------------+
          |      Postgres         |       |  SQLite      |
          |     (FHIR R4)         |       |  Events      |
          |  Patient              |       |  Store       |
          |  Appointment          |       |  per-call    |
          |  Coverage             |       |  metrics     |
          |  Practitioner         |       +------+-------+
          |  ProviderSlot · Claim |              |
          +-----------------------+              v
                                         +--------------+
                                         |  Streamlit   |
                                         | Observability|
                                         |  Dashboard   |
                                         +--------------+

The Assistant is the only thing the patient sees. It owns the conversation, decides when to delegate, and streams its responses token-by-token.
Delegate calls propagate the master's trace_id via a ContextVar so every sub-agent event lands under the same trace in the events store — the dashboard shows the whole chain as one interaction.
Plugins register in the same registry and automatically become new delegate_to_<plugin_name> tools on the assistant.

See docs/ARCHITECTURE.md for the full design decisions.

Built-in Agents

The patient always talks to a single ClinicOps Assistant (the master). Behind the scenes, the master delegates to specialized sub-agents via delegate_to_<name> tools — the patient never sees the handoff.

Sub-agent	Role	Tool Surface
Onboarding	Registers new patients into the FHIR database. Duplicate-checks by phone before insert.	`lookup_patient`, `register_patient`
Scheduler	Books, reschedules, and cancels appointments. Handles double-bookings, slot conflicts, provider availability.	`find_open_slots`, `book_appointment`, `cancel_appointment`, `lookup_patient`
Eligibility	Checks insurance coverage from the FHIR Coverage resource. Flags expired plans, missing prior auth, ineligible services.	`lookup_coverage`, `check_active_period`, `get_payor_rules`

A Billing/RCM sub-agent is planned for Phase 2. See ROADMAP.md.

Extending ClinicOps

Every clinic is different. Add your own workflow by dropping a single .py file into plugins/:

# plugins/prior_auth.py
AGENT_NAME = "prior_auth"
AGENT_DESCRIPTION = "Checks prior authorization requirements and status."

def build_agent():
    from clinic_ops_copilot.agents.base import Agent
    return Agent(name=AGENT_NAME, system_prompt=SYSTEM_PROMPT, tools=TOOLS, tool_funcs=TOOL_FUNCS)

On next startup the assistant automatically picks up the plugin as a new delegate_to_prior_auth tool and starts routing relevant requests to it. No changes to core code required.

See plugins/README.md for the full contract and a reference implementation (plugins/_prior_auth_example.py).

Tech Stack

Language: Python 3.11+
Agent framework: OpenAI Python SDK with a custom tool-use loop (no LangChain, no LlamaIndex, no abstraction tax). Works with any OpenAI-compatible provider — OpenRouter (default, anthropic/claude-sonnet-4.5), OpenAI, or Ollama locally. Configure via LLM_API_KEY, LLM_BASE_URL, LLM_MODEL.
Data layer: PostgreSQL with FHIR R4 schema via fhir.resources Pydantic models
Synthetic data: Synthea (open-source synthetic patient generator)
Observability: SQLite events store + Streamlit dashboard
Eval harness: Plain Python + 20 golden test cases (booking conflicts, coverage edge cases, Spanish code-switching)
CI/CD: GitHub Actions (lint, type check, tests, eval harness on every PR)
Packaging: uv

Eval Harness

The golden test suite runs in two modes per case:

Deterministic — calls a tool function directly with known inputs and asserts on the return value. Runs in CI with no API keys or database.
Agent — invokes the real LLM-backed agent over its tool surface. Requires LLM_API_KEY and a seeded Postgres; skipped gracefully if the prerequisites are missing.

The suite covers:

Scheduling: booking flow, reschedule, cancel, provider-unavailable fallbacks, same-day requests
Eligibility: payor rules (Aetna, Cigna, Blue Cross, Medicare, unknown payor)
Triage (classifier): English, Spanish, English-Spanish code-switching mid-utterance, urgency detection
Triage (agent): assistant routes through delegate_to_* tools
Onboarding: new-patient registration flow

Run them with uv run clinicops eval. Pass/fail metrics are written to the events store and surfaced in the dashboard.

Why FHIR?

FHIR R4 is the wire format every modern healthcare integration touches. Most LLM demo projects use generic SQL or made-up schemas. This one uses the standard, so the integration patterns shown here transfer directly to real EHR work (Epic, Cerner, Athenahealth, OpenEMR).

The data is synthetic, generated by Synthea, which is the open-source standard for synthetic patient data used by HL7, ONC, and most healthcare AI vendors during development.

Why Observability Is First-Class

Most agent demos have no telemetry. In production, observability is the first thing customers ask about after initial deployment. This project ships a Streamlit dashboard out of the box — not a wishlist feature for v2.

The dashboard surfaces:

Per-agent call counts
p50/p95 latency
Tool-call error rate
Sample of recent decisions with full trace (prompt, tool calls, tool responses, final answer)
Eval harness pass/fail trend

Tradeoffs and What I Would Do With More Time

Multi-tenant row-level security with Postgres RLS so multiple clinics can share infra safely
Real EHR integration via OpenEMR FHIR API or a Mirth Connect integration channel
Eval harness on production traces -- replay real anonymized customer traffic against new prompts before promotion
Latency-aware routing -- fall back to a cheaper model for high-confidence intents
PHI redaction layer -- Presidio or a custom NER model in front of every prompt

Project Structure

clinic-ops-copilot/
├── README.md                  # this file
├── ROADMAP.md                 # phased delivery plan
├── CHANGELOG.md               # what shipped in each release
├── CONTRIBUTING.md            # dev setup, tests, plugin guide
├── docs/
│   ├── ARCHITECTURE.md        # design decisions
│   └── QUICKSTART.md          # full walkthrough
├── plugins/                   # drop .py files here to add new workflows
│   ├── README.md              # plugin contract and how-to
│   └── _prior_auth_example.py # reference implementation (prefixed _ = inactive)
├── src/clinic_ops_copilot/
│   ├── agents/                # base, registry, triage (master), onboarding,
│   │                          # scheduler, eligibility
│   ├── tools/                 # tool surfaces for each sub-agent
│   ├── storage/               # Postgres queries, events store, database helpers
│   ├── observability/         # logging, tracing, Streamlit dashboard
│   ├── eval/                  # eval harness runner
│   └── cli/                   # clinicops CLI (REPL + subcommands)
├── tests/
│   └── unit/                  # smoke tests for agents, tools, registry, evals
├── evals/golden/cases.json    # golden test cases
├── scripts/
│   ├── init.sql               # Postgres schema
│   └── seed.py                # synthetic FHIR data generator
└── data/synthea/              # generated synthetic data (gitignored)

Roadmap

See ROADMAP.md for the phased plan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClinicOps Copilot

What This Is

Quick Start

Architecture

Built-in Agents

Extending ClinicOps

Tech Stack

Eval Harness

Why FHIR?

Why Observability Is First-Class

Tradeoffs and What I Would Do With More Time

Project Structure

Roadmap

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
docs		docs
evals/golden		evals/golden
plugins		plugins
scripts		scripts
src/clinic_ops_copilot		src/clinic_ops_copilot
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

ClinicOps Copilot

What This Is

Quick Start

Architecture

Built-in Agents

Extending ClinicOps

Tech Stack

Eval Harness

Why FHIR?

Why Observability Is First-Class

Tradeoffs and What I Would Do With More Time

Project Structure

Roadmap

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages