QueryReceipts

Provably faster. Query tuning where every fix ships with receipts: an equivalence proof, a plan diff, and a benchmark delta — a certificate a DBA can take to a change-advisory board.

Short name: Receipts (CLI: receipts).

Install

pipx install "git+https://github.com/debabsah/query-receipts"   # or pip
receipts --help

Runtime is pure Python stdlib (≥3.10) — zero dependencies, zero supply-chain surface.

As a Claude Code plugin (the conversational skin that drives the CLI):

/plugin marketplace add debabsah/query-receipts
/plugin install query-receipts@receipts-plugins

Quickstart

receipts init my-case --engine sqlserver --database Sales --symptom "30 min instead of 2"
cp slow_query.sql my-case/original.sql
receipts prescribe diagnostics --case my-case     # → run in SSMS, save where it says
receipts add my-case/runs/baseline/diagnostics.txt --kind stats_io \
  --transport courier --environment production --runner me --case my-case
receipts parse ev-0001 --case my-case             # 1 KB summary, not a 50 KB dump
# …draft optimized/optimized_v1.sql, then prove → measure → certify:
receipts prescribe validation --rewrite optimized/optimized_v1.sql --natural-key OrderID --case my-case
receipts grade ev-0002 --case my-case
receipts prescribe benchmark  --rewrite optimized/optimized_v1.sql --case my-case
receipts certify --validation ev-0002 --benchmark ev-0003 \
  --rewrite optimized/optimized_v1.sql --case my-case

See the worked example — real certificates from reproducible workloads this repo's CI re-proves on every change: SQL Server (reads −99.8%, elapsed −91.3%, 16/16 equivalence checks) and PostgreSQL via the driver transport (elapsed 4,732 ms → 25 ms, buffer reads 988,853 → 369, 15/15 checks).

The thesis

LLMs made query-tuning advice free. What stays scarce is:

Evidence acquisition — getting the right captures out of locked-down, messy production systems.
Verification — proving a proposed rewrite returns identical results and is actually faster.
Trust & audit — showing a skeptical reviewer exactly why a change is safe, with provenance for every claim.

QueryReceipts owns the scarce part. The model suggests; the product proves.

The core asset

A proof engine, built once and reused everywhere:

Equivalence certification (row counts, bidirectional EXCEPT, per-column distributions, grain reconciliation on a natural key).
Plan/trace diffing (est-vs-actual skew, spills, conversions, operator deltas).
Benchmark attestation (before/after IO and time, warm/cold cache).

Wrapped in capture packs per engine: SQL Server first, then Postgres, Snowflake, BigQuery.

The transport invariant

The proof engine never connects to anything. It consumes evidence artifacts — files with provenance (who ran it, where, when, how it traveled, content hash) — and emits capture prescriptions and certificates. Transports fulfill prescriptions:

Human courier (air-gapped): user runs the prescribed script in SSMS/psql, saves output to the prescribed path.
Approve-each-script: agent proposes, human executes or one-click approves.
MCP-connected: an MCP server fulfills capture prescriptions automatically.
Direct driver (ODBC/JDBC): full read access, zero-friction loop.
CI runner: prevention mode, captures against a stats-clone.

Air-gapped is the base case, not the degraded case. Every connected mode is the same loop with a faster courier. A feature that cannot work air-gapped gets redesigned or explicitly marked transport-dependent — never silently assumed connected.

Composable dimensions

Orthogonal axes, combinable by design; every evidence artifact records its coordinates in the case file:

Engine pack: SQL Server / Postgres / MySQL / Snowflake / BigQuery …
Transport: courier / approve-each / MCP / driver / CI.
Permission tier: read-only / VIEW DATABASE STATE / sysadmin — capture menus are keyed to what the tier can actually see.
Data sensitivity: metadata-only / sampled rows / full rows. Strictest cell (air-gapped + metadata-only) must remain fully usable.
Workload shape: single SELECT / view / stored proc / DML-ETL / batch.
Environment: production / staging / stats-clone.
Mode: cure (episodic investigation) / prevention (CI gate).

Unreachable cells are represented honestly: the engine prescribes the missing permission or capture instead of pretending.

Sequencing

Cure (now): episodic verified-fix engine; cloud-warehouse cost savings as the quantifiable-ROI wedge.
Air-gap posture (early, cheap): the audit trail proving the AI never touched the system — the enterprise differentiator.
Prevention (later): CI gate — equivalence + plan-shape regression check on every PR that touches SQL. Same proof engine, habitual usage.
Platform (earned, not started): generalize the harness only after one domain is won.

Non-negotiables

Every claim cites its capture, or names the missing capture instead of guessing.
The investigation persists as a replayable case file — the compounding, org-specific asset.
No suggestion ships without its receipts.

Development

python3 -m venv .venv && .venv/bin/pip install -e '.[dev]'
.venv/bin/python -m pytest -q

Engine code is stdlib-only by policy: a proof tool should carry zero supply-chain surface. pytest is the only dev dependency.

Integration tests drive the full cure loop against a live SQL Server 2022 container (FleetDB, a deterministic synthetic workload mirroring a real production pathology):

bash scripts/fleetdb_up.sh        # needs Docker; first run takes minutes
.venv/bin/python -m pytest -m integration
bash scripts/fleetdb_down.sh

Engines and transports

Engines: SQL Server (STATISTICS IO/TIME, showplan XML, plan diff) and PostgreSQL (EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)), behind one pack registry — --engine sqlserver|postgres at receipts init.
Courier (air-gapped, the base case): prescriptions tell the human exactly what to run and where to save it.
Driver: receipts run <prescription> --runner-cmd 'psql -X -q -d db -f {sql}' executes captures itself and registers them; only the command's first token ever enters the ledger.
MCP: receipts mcp-serve exposes the whole toolset as a stdio MCP server for any MCP client.
CI: receipts verify is a tamper gate for certified cases; see docs/ci-recipe.md. This repo's own CI re-proves both engines' cure loops on every push.

Known limitations (tracked, not hidden)

Single-statement SELECT workloads only (CTEs fully supported on both engines); stored procs and DML-ETL validation are future work.
SQL Server validation staging uses fixed global temp names — don't run two validations concurrently against the same server.
Postgres reports no CPU time; certificates show elapsed and buffer reads only (never an invented number).

Lineage

Conceived 2026-06-11 from a first-principles review of the sql-query-tuner skill (sibling folder), whose equivalence-validation harness and evidence-citation rules are the kernel this product industrializes.

Philosophical imports from the analytics-office plugin (sibling), adapted not copied:

Comparability gate first (from prove-my-parity): before any old-vs-new comparison — same parameters, same data as-of, same session semantics — or the proof is declared invalid, not run anyway.
Stratified comparison over matching totals: aggregate equality with offsetting per-segment errors is a FAIL, not a PASS.
Verdicts carry provenance and age: a certificate is stamped against a data snapshot, schema version, and stats state; it expires on drift.
Pre-registered success criteria (from explore-my-data): benchmark protocol and the headline metric are pinned before measuring, killing cherry-picked warm-cache wins.
Three-valued honesty: PROVEN / REFUTED / UNVERIFIED-with-named-missing- capture. Never "safe beyond the evidence"; UNKNOWN stays UNKNOWN.
Blast-radius grading (from audit-my-assumptions): findings and escalations are prioritized by what they can silently corrupt, not by loudness.

What deliberately does not port: analytics-office is constitutionally read-only and never produces the deliverable. QueryReceipts must produce it (rewrite + certificate) and, in connected modes, execute captures. The read-only law is therefore demoted to (a) the air-gapped mode and (b) a standing posture: never write to the target system; proofs use session-local scratch only.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
skills/query-receipts		skills/query-receipts
src/queryreceipts		src/queryreceipts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QueryReceipts

Install

Quickstart

The thesis

The core asset

The transport invariant

Composable dimensions

Sequencing

Non-negotiables

Development

Engines and transports

Known limitations (tracked, not hidden)

Lineage

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QueryReceipts

Install

Quickstart

The thesis

The core asset

The transport invariant

Composable dimensions

Sequencing

Non-negotiables

Development

Engines and transports

Known limitations (tracked, not hidden)

Lineage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages