Bortlesboat · Bortlesboat · Apr 24, 2026 · Apr 24, 2026 · Apr 26, 2026
diff --git a/README.md b/README.md
@@ -102,6 +102,17 @@ cloudflared tunnel --url http://localhost:9332
 
 See [self-hosting guide](docs/self-hosting.md) for full production setup.
 
+## Research Export
+
+For offline fee-forecasting work, the repo now includes a local JSONL export path that emits the same merged row shape used by the companion benchmark importer.
+
+```powershell
+$env:PYTHONPATH='src'
+python scripts/export_fee_forecast_benchmark.py data/fee-forecast-benchmark.jsonl --hours 168 --interval-minutes 10
+```
+
+The export joins local `fee_history` observations to the next `1-6` confirmed blocks from the research tables in `data/bitcoin_api.db`. After migration `012_add_research_tables.sql`, the background fee collector also fills `block_confirmations` on each detected new block and logs fee estimates every cycle, so a normal local API run can build this export without extra manual seeding. Very recent observations without six future block outcomes are skipped automatically.
+
 ## Contributing
 
 Issues and PRs welcome. Run the test suite before submitting:

diff --git a/docs/OPERATIONS.md b/docs/OPERATIONS.md
@@ -148,6 +148,22 @@ Requires the Fee Observatory to be collecting data (`bitcoin-fee-observatory` re
 
 **Dashboard:** `GET /fee-observatory` — branded page with iframe to Streamlit dashboard (port 8505).
 
+### Export fee forecast benchmark rows
+
+Use the local benchmark export when you want importer-compatible JSONL for offline forecasting work or the companion `bitcoin-fee-forecast-bench` repo.
+
+```powershell
+$env:PYTHONPATH='src'
+python scripts/export_fee_forecast_benchmark.py data/fee-forecast-benchmark.jsonl --hours 168 --interval-minutes 10
+```
+
+Notes:
+- Reads `fee_history` observations from the main API DB and joins them to the next `1-6` `block_confirmations`
+- Emits one JSONL row per usable observation with `observation_id`, `observed_at`, `features`, and `clearing_fee_bin_by_horizon`
+- Skips observations that do not yet have six future confirmed blocks
+- The research tables come from migration `012_add_research_tables.sql`
+- On a normal local API run, the background fee collector fills those research tables automatically as new blocks arrive
+
 ### x402 Stablecoin Micropayments (optional)
 
 Enables pay-per-call via the x402 protocol (USDC on Base). Requires the `bitcoin-api-x402` package.
@@ -388,11 +404,9 @@ Replace `YOUR_KEY` with the value from your `.env` `ADMIN_API_KEY`.
 
 The background fee collector thread automatically prunes old data once per 24 hours:
 - Usage logs older than 90 days are deleted
-- Fee history older than 30 days is downsampled to hourly averages
-- Fee history older than 365 days is deleted
-- Research data (block_confirmations, fee_estimates_log) older than 365 days is deleted
+- Fee history older than 30 days is deleted
 
-The fee collector also logs multi-source fee estimates every 5 minutes (Core 8 targets, mempool.space 4 targets, local mempool 1 target) and captures block confirmation feerate percentiles on each new block.
+The fee collector also logs Core fee estimates for targets `1`, `6`, and `144` every 5 minutes, adds mempool.space estimates for `1`, `3`, `6`, and `144` when that public API is reachable, and captures block confirmation feerate percentiles on each detected new block after the collector has seen a prior tip.
 
 Check API logs for `Auto-prune:` messages to confirm it's running.
 

diff --git a/docs/SCOPE_OF_WORK.md b/docs/SCOPE_OF_WORK.md
@@ -1,7 +1,7 @@
 # Satoshi API -- Scope of Work
 
 **Version:** 0.3.4
-**Date:** 2026-03-08
+**Date:** 2026-04-24
 **Author:** Bortlesboat
 **Status:** Live -- https://bitcoinsapi.com
 
@@ -50,19 +50,19 @@ Bitcoin Core RPC (port 8332, localhost only)
 | `main.py` | App creation, lifespan, router registration (~177 lines) | Composition root |
 | `middleware.py` | Security headers, CORS, auth + rate limiting middleware, gzip compression | Middleware chain |
 | `exceptions.py` | RPC, validation, HTTP, and generic exception handlers; RFC 7807 `type` URIs | Exception handler registry |
-| `jobs.py` | Background fee collector thread lifecycle | Background worker |
+| `jobs.py` | Background fee collector thread lifecycle, fee estimate logging, and block confirmation capture for research tables | Background worker |
 | `static_routes.py` | Landing page, robots.txt, sitemap, decision pages | Static file serving |
 | `usage_buffer.py` | Batch usage logging (flush at 50 rows or 30s) | Write-behind buffer |
 | `migrations/` | SQL migration files + runner, tracked in `schema_migrations` | Sequential migrations |
 | `auth.py` | API key validation, tier resolution | Strategy (tier-based) |
 | `rate_limit.py` | Per-minute sliding window (in-memory or Upstash Redis) + daily limits | Token bucket / sliding window |
 | `notifications.py` | Transactional email (Resend) + analytics events (PostHog) | Fire-and-forget side effects |
 | `cache.py` | TTL caching with reorg-safe depth awareness, stale fallback for graceful degradation, `get_cached_node_info()` helper for non-RPC contexts | Cache-aside with lock-per-cache + stale-while-error |
-| `db.py` | SQLite (WAL mode), usage logging, key storage | Repository pattern |
+| `db.py` | SQLite (WAL mode), fee history, self-populating fee research tables, usage logging, key storage | Repository pattern |
 | `config.py` | 12-factor env var config via Pydantic | Settings singleton |
 | `dependencies.py` | Lazy singleton RPC connection | Dependency injection |
 | `models.py` | Response envelope, typed data models | DTO / envelope pattern |
-| `services/` | Business logic: fee analysis, tx broadcast, exchange comparison, serializers | Service layer (pure functions) |
+| `services/` | Business logic: fee analysis, benchmark export, tx broadcast, exchange comparison, serializers | Service layer (pure functions) |
 | `routers/` | 28 thin HTTP routers (25 core + 3 indexer) — parameter validation, auth, response envelope | RESTful resource routing |
 
 ### 2.3 Design Principles Applied
@@ -427,6 +427,9 @@ Errors follow the same structure:
 39. **Pro checkout dead end** -- "Upgrade to Pro" button returned 503; changed to "Contact for Pro" mailto link
 40. **Watchdog stale code** -- `API_DIR` resolved relative to script location (broke when Task Scheduler ran old release copy); now uses `releases/bitcoin-api-current` symlink
 
+**Benchmark Export Self-Sufficiency (Apr 24):**
+41. **Clean exporter branch could not bootstrap its own research data** -- The background fee collector now writes `block_confirmations` on detected new blocks and logs fee estimates into `fee_estimates_log`, so fresh installs can produce real benchmark export rows after migration `012_add_research_tables.sql`.
+
 ### 5.3 Known Limitations (Acceptable for v0.1)
 
 | Limitation | Impact | When to Address |
@@ -439,6 +442,7 @@ Errors follow the same structure:
 | ~~No webhook support~~ | ~~Clients must poll~~ | **RESOLVED** -- WebSocket `/api/v1/ws` with pub/sub |
 | No address transaction history | Cannot provide `/address/{addr}/txs` | Deliberate -- Bitcoin Core RPC has no `getaddresshistory`. Requires external indexer (Electrs, Fulcrum). We offer `scantxoutset` via POST `/address/utxos` for UTXO lookup by address. Adding Electrs increases deployment complexity significantly. |
 | Email delivery depends on Resend | Welcome email fails silently if Resend is down | Graceful degradation -- registration succeeds regardless, key always returned in response |
+| Fee benchmark export needs six future confirmed blocks per observation | Very recent fee-history rows are skipped until enough blocks confirm | Acceptable for offline research export; full `1-6` block outcomes matter more than max recency |
 
 ---
 
@@ -499,7 +503,7 @@ Errors follow the same structure:
 - `src/bitcoin_api/indexer/routers/` -- indexed_address, indexed_tx, indexer_status
 - `src/bitcoin_api/indexer/migrations/` -- 001_initial_schema.sql
 
-**Tests (23 test files + 2 support files):**
+**Tests (current repo test files + support files):**
 - `tests/test_health.py` -- 11 tests (health, root, status, healthz, docs, visualizer)
 - `tests/test_blocks.py` -- 18 tests (block-related endpoints)
 - `tests/test_fees.py` -- 45 tests (fee endpoints + fee research infrastructure)
@@ -525,6 +529,8 @@ Errors follow the same structure:
 - `tests/test_indexer_services.py` -- 12 tests (address balance/history, transaction detail)
 - `tests/test_price_service.py` -- 13 tests (price service provider fallback, caching, error handling)
 - `tests/test_observatory.py` -- 13 tests (Fee Observatory endpoints: scoreboard, block-stats, estimates, 503 fallback, static page)
+- `tests/test_fee_benchmark_export.py` -- 2 tests (benchmark export row builder + CLI writer)
+- `tests/test_jobs.py` -- 2 tests (single-iteration fee collector coverage for research table population)
 - `tests/test_x402_stats.py` -- 6 tests (x402 payment analytics)
 - `tests/test_e2e.py` -- 21 e2e tests (against live node)
 - `tests/locustfile.py` -- Load test (8 weighted endpoints)
@@ -548,8 +554,9 @@ Errors follow the same structure:
 **Project config (1 file):**
 - `CLAUDE.md` -- Project instructions for AI-assisted development
 
-**Scripts (14 files):**
+**Scripts (15 files):**
 - `scripts/create_api_key.py`, `scripts/seed_db.py`
+- `scripts/export_fee_forecast_benchmark.py` (writes benchmark-ready JSONL from local fee research data)
 - `scripts/security_check.sh` (requires `SATOSHI_API_KEY` env var for POST tests)
 - `scripts/security_audit.py` (10 automated security checks)
 - `scripts/staging-check.sh` (pre-deploy validation: starts staging server, checks CSP/headers/docs/endpoints)
@@ -564,6 +571,12 @@ Errors follow the same structure:
 - `scripts/smoke-test-api.sh` (5-point health check for cron monitoring; supports --quiet)
 - `scripts/doc_consistency.py` (CI-enforced doc consistency checks)
 
+**Research export surfaces (5 files):**
+- `src/bitcoin_api/services/benchmark_export.py` (joins fee history to future block outcomes for offline benchmark export)
+- `src/bitcoin_api/benchmark_export_cli.py` (CLI entrypoint for benchmark-ready JSONL export)
+- `src/bitcoin_api/migrations/012_add_research_tables.sql` (fee research tables for block confirmations and estimate logs)
+- `src/bitcoin_api/jobs.py` + `src/bitcoin_api/db.py` (background collector now populates those research tables during normal API operation)
+
 **Legal (3 files):**
 - `static/terms.html` -- Terms of Service (FL governing law, liability limitation, acceptable use)
 - `static/privacy.html` -- Privacy Policy (data collection, retention, third-party services)

diff --git a/scripts/export_fee_forecast_benchmark.py b/scripts/export_fee_forecast_benchmark.py
@@ -0,0 +1,7 @@
+"""Local wrapper for exporting benchmark-ready fee forecast rows."""
+
+from bitcoin_api.benchmark_export_cli import main
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/src/bitcoin_api/benchmark_export_cli.py b/src/bitcoin_api/benchmark_export_cli.py
@@ -0,0 +1,50 @@
+"""CLI for exporting benchmark-ready fee forecast datasets."""
+
+from __future__ import annotations
+
+import argparse
+from pathlib import Path
+
+from .services.benchmark_export import write_fee_forecast_benchmark_export
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Export fee research tables into benchmark-ready JSONL rows.",
+    )
+    parser.add_argument("output_path", type=Path, help="Destination JSONL path")
+    parser.add_argument(
+        "--hours",
+        type=int,
+        default=168,
+        help="How many recent hours of fee history to inspect (default: 168)",
+    )
+    parser.add_argument(
+        "--interval-minutes",
+        type=int,
+        default=10,
+        help="Fee history downsampling interval in minutes (default: 10)",
+    )
+    parser.add_argument(
+        "--limit",
+        type=int,
+        default=None,
+        help="Optional cap on exported examples (keeps the most recent rows)",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+    write_fee_forecast_benchmark_export(
+        args.output_path,
+        hours=args.hours,
+        interval_minutes=args.interval_minutes,
+        limit=args.limit,
+    )
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/src/bitcoin_api/db.py b/src/bitcoin_api/db.py
@@ -110,6 +110,68 @@ def record_fee_snapshot(
     conn.commit()
 
 
+def record_block_confirmation(
+    block_height: int,
+    block_hash: str,
+    block_time: str,
+    tx_count: int,
+    total_fees_sat: int,
+    min_feerate: float,
+    max_feerate: float,
+    p10_feerate: float,
+    p25_feerate: float,
+    p50_feerate: float,
+    p75_feerate: float,
+    p90_feerate: float,
+    core_est_1: float | None = None,
+    core_est_6: float | None = None,
+    core_est_144: float | None = None,
+    mempool_local_est: float | None = None,
+    mempool_space_est: float | None = None,
+) -> None:
+    conn = get_db()
+    conn.execute(
+        "INSERT OR REPLACE INTO block_confirmations "
+        "(block_height, block_hash, block_time, tx_count, total_fees_sat, "
+        "min_feerate, max_feerate, p10_feerate, p25_feerate, p50_feerate, "
+        "p75_feerate, p90_feerate, core_est_1, core_est_6, core_est_144, "
+        "mempool_local_est, mempool_space_est) "
+        "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
+        (
+            block_height,
+            block_hash,
+            block_time,
+            tx_count,
+            total_fees_sat,
+            min_feerate,
+            max_feerate,
+            p10_feerate,
+            p25_feerate,
+            p50_feerate,
+            p75_feerate,
+            p90_feerate,
+            core_est_1,
+            core_est_6,
+            core_est_144,
+            mempool_local_est,
+            mempool_space_est,
+        ),
+    )
+    conn.commit()
+
+
+def record_fee_estimates_batch(entries: list[tuple[str, int, float]]) -> None:
+    if not entries:
+        return
+
+    conn = get_db()
+    conn.executemany(
+        "INSERT INTO fee_estimates_log (source, target, feerate) VALUES (?, ?, ?)",
+        entries,
+    )
+    conn.commit()
+
+
 def get_fee_history(hours: int = 24, interval_minutes: int = 10) -> list[dict]:
     conn = get_db()
     rows = conn.execute(