Skip to content
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b15ef8a
build(deps): bump the bsv-workspace group with 32 updates
dependabot[bot] Jun 22, 2026
f005f07
build(deps): bump the infra-deps group across 8 directories with 7 up…
dependabot[bot] Jun 22, 2026
e7a333e
docs(infra): OpenTelemetry + structured logging design spec
sirdeggen Jun 22, 2026
e29441f
feat(overlay-server): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
d8b73e0
feat(wallet-infra): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
22e5d57
feat(message-box-server): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
645e456
feat(chaintracks-server): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
5cf724c
feat(uhrp-server-cloud-bucket): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
eeca4e8
feat(uhrp-server-basic): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
6ebe455
feat(wab): emit OpenTelemetry traces, metrics, logs
sirdeggen Jun 22, 2026
4f86a74
chore(infra): unify telemetry bootstrap across components
sirdeggen Jun 22, 2026
f8e8e60
docs(infra): add OpenTelemetry observability runbook
sirdeggen Jun 22, 2026
3d62128
refactor(wallet-infra): replace console.* with structured log.*
sirdeggen Jun 22, 2026
893fdab
refactor(message-box-server): replace raw console.* with structured l…
sirdeggen Jun 22, 2026
ef139fa
refactor(chaintracks-server): replace console.* with structured log.*
sirdeggen Jun 22, 2026
79682ec
refactor(uhrp-server-cloud-bucket): replace console.* with structured…
sirdeggen Jun 22, 2026
d4c981f
refactor(uhrp-server-basic): replace console.* with structured log.*
sirdeggen Jun 22, 2026
9c2cc24
refactor(wab): replace console.* with structured log.*
sirdeggen Jun 22, 2026
98268a5
fix(infra): redact PII/credentials from structured logs before egress
sirdeggen Jun 22, 2026
942abd4
fix(infra): boot overlay-server + repair ESM telemetry loader hook
sirdeggen Jun 22, 2026
6908481
feat(infra): unified local stack with Traefik hostname routing
sirdeggen Jun 22, 2026
07d3705
fix(infra): make the full local stack boot without crashing
sirdeggen Jun 22, 2026
9a9d9e6
change(wallet-infra): default network to mainnet, not mock/test
sirdeggen Jun 23, 2026
098f0ce
fix(docs): add missing frontmatter to infra-opentelemetry design spec
sirdeggen Jun 23, 2026
0594fc6
fix(docs): add infra to allowed domain values in page schema
sirdeggen Jun 23, 2026
236e3fe
Merge remote-tracking branch 'origin/dependabot/npm_and_yarn/bsv-work…
sirdeggen Jun 23, 2026
83a38c2
chore(infra): merge dependabot PRs #222 and #223 into feat/infra-open…
sirdeggen Jun 23, 2026
693adb7
fix(infra): address Copilot review findings on PR #226
sirdeggen Jun 23, 2026
58fc3dc
fix(docs-site): include .md/.mdx in React plugin to resolve jsx-runtime
sirdeggen Jun 23, 2026
1d108d9
fix(docs-site): alias react/jsx-runtime so MDX-compiled docs resolve
sirdeggen Jun 24, 2026
125d43d
fix(docs-site): pin react-router-dom to v6 for vite-react-ssg compat
sirdeggen Jun 24, 2026
03e4555
fix(ts-p2p): cast gossipsub service factory after libp2p type drift
sirdeggen Jun 24, 2026
56f1aba
fix(tests): make bsv-wallet-helper tests hermetic; drop live storage dep
sirdeggen Jun 24, 2026
fc0be52
fix(wallet-toolbox): pin chalk to v4 so jest can parse createAction2 …
sirdeggen Jun 24, 2026
54070e2
Merge origin/main and resolve version conflicts
Copilot Jun 24, 2026
d06b40e
fix(infra): resolve wab + wallet-infra TypeScript build errors
BraydenLangley Jun 24, 2026
a1c50d5
fix(infra): sync wab + chaintracks lockfiles to @bsv/wallet-toolbox 2…
BraydenLangley Jun 24, 2026
df8a85c
fix(ci): repair pnpm-lock.yaml broken entry for wallet-toolbox-client
sirdeggen Jun 24, 2026
13052cd
test(ts-paymail): mock DNS + DoH in dnsResolver tests to remove CI flake
sirdeggen Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/_schemas/page.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"id": { "type": "string", "description": "Stable slug, never changes" },
"title": { "type": "string" },
"kind": { "type": "string", "enum": ["package", "infra", "spec", "guide", "conformance", "reference", "meta"] },
"domain": { "type": ["string", "null"], "enum": ["sdk", "wallet", "network", "overlays", "messaging", "middleware", "helpers", null] },
"domain": { "type": ["string", "null"], "enum": ["sdk", "wallet", "network", "overlays", "messaging", "middleware", "helpers", "infra", null] },
"version": { "type": "string" },
"source_repo": { "type": "string" },
"source_commit": { "type": "string" },
Expand Down
95 changes: 95 additions & 0 deletions docs/superpowers/specs/2026-06-22-infra-opentelemetry-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
id: infra-opentelemetry-design
title: Infra OpenTelemetry & Structured Logging — Design
kind: spec
domain: infra
version: 1.0.0
last_updated: "2026-06-22"
last_verified: "2026-06-22"
status: experimental
tags:
- opentelemetry
- observability
- infra
- logging
---

# Infra OpenTelemetry & Structured Logging — Design

**Date:** 2026-06-22
**Goal:** Every infra component in the stack produces OpenTelemetry (traces, metrics, logs) to improve observability — specifically to find bugs faster and diagnose memory leaks / resource issues.

## Scope

Seven standalone infra components (each its own npm project — own `package-lock.json`, **not** in the pnpm workspace):

| Component | Pkg name | Module | Build dir | Entry | Notes |
|---|---|---|---|---|---|
| overlay-server | `@bsv/overlay-express-examples` | CJS | `dist/` | `dist/index.ts` | Express owned by `@bsv/overlay-express`; Mongo + MySQL/Knex. **Reference impl.** |
| wallet-infra | `@bsv/wallet-infra` | **ESM** | `out/` | `out/src/index.js` | Express; nginx front |
| message-box-server | `@bsv/messagebox-server` | **ESM** | `out/` | `out/src/index.js` | Express + auth/payment middleware; nginx |
| chaintracks-server | `chaintracks-server` | CJS | `dist/` | `dist/server.js` | Express |
| uhrp-server-cloud-bucket | `@bsv/uhrp-storage-server` | CJS | `out/` | `out/src/index.js` | Express + Bugsnag; notifier sidecar |
| uhrp-server-basic | `@bsv/uhrp-lite` | CJS | `out/` | `out/src/index.js` | Express; **no Dockerfile** |
| wab | `@bsv/wab-server` | CJS | `dist/` | `dist/server.js` | Express + rate-limit |

Rollout order (fixed by user): **overlay-server → wallet-infra → message-box-server → chaintracks-server → uhrp-server-cloud-bucket → uhrp-server-basic → wab**.

## Decisions (locked)

- **Exporter:** OTLP/HTTP, all config from `OTEL_*` env. No vendor hardcoding (backend is OTLP-compatible, e.g. Coralogix collector). Endpoint unset → console exporters so boot never breaks in dev.
- **Signals:** Traces + Metrics + Logs.
- **Load:** Preload before app code. CJS → `node --require ./<out>/telemetry.js`; ESM → `node --import ./<out>/telemetry.mjs`. Guarantees auto-instrumentation patches modules before import.
- **Duplication:** Each component owns its `src/telemetry.ts` (identical content, compiled by existing `tsc`). No generator.
- **Structured logging:** Adopt **pino** as the structured logger, replacing ad-hoc `console.log`. `@opentelemetry/instrumentation-pino` auto-injects `trace_id`/`span_id` so logs correlate to spans. A console→OTel log shim stays as a fallback for un-converted call sites.

## Architecture

### Per-component telemetry bootstrap (`src/telemetry.ts`)

Starts a `NodeSDK` (`@opentelemetry/sdk-node`) with:

- **Resource**: `service.name` (= package name, overridable via `OTEL_SERVICE_NAME`), `service.version` (= package version), `deployment.environment` (from `DEPLOY_ENV`/`NODE_ENV`, default `development`). Correct even with zero env set.
- **Auto-instrumentation**: `getNodeAutoInstrumentations()` — HTTP, Express, Mongo/Mongoose, MySQL2, DNS, net, pino. Filesystem instrumentation disabled (noise).
- **Runtime metrics**: `@opentelemetry/instrumentation-runtime-node` — heap used/total, GC pause/count, event-loop lag, active handles. **This is the primary memory-leak signal.**
- **Exporters** (chosen at runtime by presence of `OTEL_EXPORTER_OTLP_ENDPOINT`):
- set → OTLP/HTTP trace + metric (PeriodicExportingMetricReader) + logs exporters.
- unset → `ConsoleSpanExporter` / console metric + log exporters.
- **Logs**: `LoggerProvider` with OTLP (or console) `BatchLogRecordProcessor`; console→OTel shim patches `console.*` to also emit log records at mapped severities.
- **Graceful shutdown**: `SIGTERM`/`SIGINT` → `sdk.shutdown()` to flush before exit.

### Deps added per component (`@opentelemetry/…`)

`sdk-node`, `auto-instrumentations-node`, `instrumentation-runtime-node`, `exporter-trace-otlp-http`, `exporter-metrics-otlp-http`, `exporter-logs-otlp-http`, `resources`, `semantic-conventions`, `api-logs`, plus `pino`.

### Dockerfile / compose changes

- `CMD` gains the preload flag (`--require`/`--import` per module type).
- `docker-compose.yml` passes through `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_SERVICE_NAME`, `OTEL_RESOURCE_ATTRIBUTES`, `DEPLOY_ENV`.
- uhrp-server-basic has no Dockerfile → preload added to `start`/`dev` scripts via `NODE_OPTIONS` or `--require`.

## Per-component phases

Each component goes through three phases; depth of B/C scales with the component:

- **Phase A — Bootstrap:** add deps, `telemetry.ts`, preload wiring, compose env. Signals flow from auto-instrumentation + runtime metrics. Build verifies clean.
- **Phase B — Structured logging:** audit existing log sites, replace `console.*` with a pino logger emitting leveled, structured events with **stable field names** (`service`, `operation`, `duration_ms`, plus domain fields like `tx_id`, `topic`, `host`). Drop noisy/duplicate logs; promote silent failures to logged events.
- **Phase C — Domain spans/metrics:** wrap the operations that matter (overlay submit/lookup, wallet storage calls, message send/ack, header sync) in spans with attributes, and add a few custom counters/histograms where a bug or leak would show up.

overlay-server (reference) gets A+B+C fully, establishing the template; later components reuse its `telemetry.ts` verbatim and apply B/C proportional to their surface.

## Field-name conventions (structured logs)

Stable keys so queries work across services: `service`, `env`, `operation`, `outcome` (`ok`|`error`), `duration_ms`, `error.type`, `error.msg`, plus OTel-injected `trace_id`/`span_id`. Domain keys namespaced per component.

## Testing / verification

- Each component: `npm run build` clean; boot locally with `OTEL_EXPORTER_OTLP_ENDPOINT` unset → console spans/metrics/logs visible; boot with a local OTLP collector → spans/metrics/logs received.
- No new lint errors. Memory-leak signal confirmed by observing `runtime.node.memory.heap.used` + GC metrics in console/collector.
- Per release-flow memory: patch-bump only own `version` field; do not run sync-versions; user builds + tests the Docker image locally before any push.

## Out of scope

- Choosing/standing up the collector or backend (env-driven; user supplies endpoint).
- Distributed-trace context propagation across components beyond what auto-instrumentation provides via HTTP headers (W3C tracecontext is on by default).
- Dashboards/alerts (separate effort; Coralogix CLI skills available later).
58 changes: 58 additions & 0 deletions infra/LOCAL_STACK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Local infra stack

Runs the BSV infra components together behind a Traefik reverse proxy that routes
by hostname, so you can hit each service at `<name>.localhost` in the browser.

```sh
docker compose -f infra/docker-compose.yaml up --build
```

| URL | Component |
|---|---|
| http://overlay.localhost | overlay-server |
| http://wallet.localhost | wallet-infra |
| http://messagebox.localhost | message-box-server |
| http://chaintracks.localhost | chaintracks-server |
| http://wab.localhost | wab |
| http://uhrp.localhost | uhrp-server-basic |
| http://localhost:8080/dashboard/ | Traefik dashboard |

(`uhrp-server-cloud-bucket` is intentionally excluded — it needs a real GCP bucket
+ service-account credentials and can't run locally.)

## Hostname resolution

Chromium-based browsers and Firefox resolve `*.localhost` to `127.0.0.1`
automatically. **Safari and `curl` on macOS do not** — add the hosts once:

```sh
echo "127.0.0.1 overlay.localhost wallet.localhost messagebox.localhost chaintracks.localhost wab.localhost uhrp.localhost traefik.localhost" | sudo tee -a /etc/hosts
```

Quick check without editing hosts:

```sh
curl -H 'Host: chaintracks.localhost' http://127.0.0.1/
```

## What runs

- **traefik** — fronts `:80`, routes by `Host` header using the file provider
(`local/traefik/dynamic.yml`); dashboard on `:8080`. (File provider, not the
docker provider: the local daemon rejects Traefik's docker API calls with a 400.)
- **mysql** (shared) — one container, four databases created on first boot
(`appdb`, `wallet_storage`, `messagebox-backend`, `app`); host port `3307`.
- **mongo** (shared) — for overlay-server; host port `27018`.
- the six app components, built from their own directories.

## Notes / caveats

- Keys and passwords in the compose file are **throwaway local-dev values only**.
- `wallet-infra` runs with `BSV_NETWORK=mock` (no external chain services needed).
Comment thread
sirdeggen marked this conversation as resolved.
Outdated
- `overlay-server`, `wab`, and `uhrp-server-basic` reach out to external BSV
services (wallet storage, ARC) at runtime; some operations need network access
or real backends to fully succeed. Routing + telemetry still work regardless.
- Telemetry: set `OTEL_EXPORTER_OTLP_ENDPOINT` (+ `OTEL_EXPORTER_OTLP_HEADERS`)
in your environment before `up` to ship traces/metrics/logs to your collector;
unset falls back to console exporters. See `infra/OBSERVABILITY.md`.
- First `up` builds six images and runs `npm ci` in each — expect a few minutes.
88 changes: 88 additions & 0 deletions infra/OBSERVABILITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Infra Observability (OpenTelemetry)

Every infra component emits OpenTelemetry **traces, metrics and logs**. Each
component has a self-contained bootstrap (`src/telemetry.ts`) that is preloaded
before application code so auto-instrumentation can patch modules before they
are imported.

## Components

| Component | Module | Preload |
|---|---|---|
| overlay-server | CJS | `node --require ./dist/telemetry.js dist/index.js` |
Comment thread
Copilot marked this conversation as resolved.
Outdated
| chaintracks-server | CJS | `node --require ./dist/telemetry.js dist/server.js` |
| wab | CJS | `node --require ./dist/telemetry.js dist/server.js` |
| uhrp-server-cloud-bucket | CJS | `node --require ./out/src/telemetry.js … out/src/index.js` |
| uhrp-server-basic | CJS | `ts-node -r ./src/telemetry.ts src/index.ts` / `start:prod` |
| wallet-infra | ESM | `node --import ./out/src/telemetry.js out/src/index.js` |
| message-box-server | ESM | `node --import ./out/src/telemetry.js out/src/index.js` |

ESM components (overlay-server, wallet-infra, message-box-server) deliberately do
**not** register the `import-in-the-middle` loader hook. That hook rebuilds the
named exports of CJS packages imported as ESM and drops some of them (e.g.
`@bsv/sdk`'s `PushDrop`), crashing the app at import time. The libraries we
actually instrument (http, express, mongodb, mysql2, pino) are loaded through CJS
dependency chains (overlay-express, wallet-toolbox, authsocket) and remain patched
by `require-in-the-middle`, so auto-instrumentation coverage is retained.

## Configuration

All wiring is driven by standard `OTEL_*` environment variables. The Dockerfiles
and `docker-compose.yml` files pass these through.

| Variable | Purpose | Default |
|---|---|---|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP/HTTP collector base URL. **Unset → console exporters** (dev-safe). | — |
| `OTEL_EXPORTER_OTLP_HEADERS` | Comma-separated headers, e.g. auth for Coralogix. | — |
| `OTEL_SERVICE_NAME` | Overrides `service.name` (defaults to the package name). | package name |
| `OTEL_RESOURCE_ATTRIBUTES` | Extra resource attributes. | — |
| `DEPLOY_ENV` / `NODE_ENV` | Becomes `deployment.environment`. | `development` |
| `OTEL_METRIC_EXPORT_INTERVAL` | Metric export interval (ms). | `60000` |
| `OTEL_DIAG` | `true` enables OTel internal diagnostic logging. | off |
| `LOG_LEVEL` | pino log level. | `info` |

Point the whole stack at a collector by exporting once, e.g.:

```sh
export OTEL_EXPORTER_OTLP_ENDPOINT="https://ingress.<region>.coralogix.com"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <key>"
docker compose up
```

With the endpoint **unset**, each service prints spans/metrics/logs to the
console — useful for verifying instrumentation locally without a backend.

## Signals

- **Traces** — HTTP, Express, MongoDB, MySQL/Knex, DNS auto-instrumentation, plus
a `*.bootstrap` span per service wrapping startup.
- **Metrics** — HTTP server/client metrics, and **runtime metrics**
(`nodejs.eventloop.*`, `v8js.memory.heap.*`, GC) via
`@opentelemetry/instrumentation-runtime-node`. These are the primary signal for
**memory-leak and event-loop diagnosis**.
- **Logs** — structured JSON via **pino** (`src/logger.ts`), with `trace_id` /
`span_id` injected by `@opentelemetry/instrumentation-pino` so logs correlate to
traces, shipped over OTLP. Stray `console.*` calls are also bridged to OTel logs
during the migration to structured logging.

### Structured logging conventions

Use stable field names so queries work across services:
`service`, `env`, `operation`, `outcome` (`ok` | `error`), `duration_ms`, `err`,
plus domain-specific keys. Example:

```ts
import { log } from './logger'
log.info({ operation: 'listen', outcome: 'ok', port }, 'server listening')
```

## Notes

- Telemetry shutdown flushes the SDK on `SIGTERM`/`SIGINT` and only force-exits
when the app has no signal handler of its own (e.g. chaintracks owns its
lifecycle), so it never preempts application cleanup.
- Adding telemetry introduced no new dependency CVEs; pre-existing transitive
advisories (e.g. message-box `firebase-admin → @google-cloud/storage`) are
unrelated.

See the design spec: `docs/superpowers/specs/2026-06-22-infra-opentelemetry-design.md`.
5 changes: 3 additions & 2 deletions infra/chaintracks-server/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ USER node
# 3012 - CDN Server (bulk headers)
EXPOSE 3011 3012

# Run the application
CMD ["node", "dist/server.js"]
# Run the application with the OpenTelemetry bootstrap preloaded so
# auto-instrumentation patches modules before app code is imported.
CMD ["node", "--require", "./dist/telemetry.js", "dist/server.js"]
7 changes: 7 additions & 0 deletions infra/chaintracks-server/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ services:
# Logging
- LOG_LEVEL=${LOG_LEVEL:-info}

# OpenTelemetry — point at any OTLP/HTTP collector. Unset => console exporters.
- OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_EXPORTER_OTLP_ENDPOINT:-}
- OTEL_EXPORTER_OTLP_HEADERS=${OTEL_EXPORTER_OTLP_HEADERS:-}
- OTEL_SERVICE_NAME=${OTEL_SERVICE_NAME:-chaintracks-server}
- OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES:-}
- DEPLOY_ENV=${DEPLOY_ENV:-production}

volumes:
# Persist bulk headers across container restarts
- bulk-headers:/app/public/headers
Expand Down
Loading