aisix-obs: report per-exporter delivery health to the control plane + on-demand probe delivery

## Problem

A misconfigured observability exporter fails silently from the operator's point of view. Delivery failures (unset `OBJSTORE_CRED_*` / `SLS_CRED_*` / `DD_CRED_*` env vars, wrong endpoint, revoked key) surface only in DP logs; the dashboard shows the exporter as `enabled` forever. The docs promise "the sink reports unhealthy" — but that state never leaves the DP process.

## What exists today (origin/main)

- `crates/aisix-obs/src/pipeline.rs` — `SinkStatsSnapshot { sent, dropped, retries, failed_batches, last_error }` is already tracked per sink in-process, with masked `last_error` (≤200 chars). Never transmitted anywhere.
- `crates/aisix-obs/src/sink/mod.rs` — `ObservabilitySink::healthcheck() -> SinkHealth { healthy, detail }` exists, but is a stub (always healthy) for `otlp_http` / `sls` / `datadog`; only `object_store` has a real connectivity probe.
- `crates/aisix-server/src/telemetry.rs` — a periodic mTLS reporting loop to the CP already exists (usage events, flush at 100 events / 5s to `/dp/telemetry`), and a separate heartbeat worker POSTs `/dp/heartbeat` with `{dp_id, uptime_seconds, version, rejected_resources}`.
- Config arrives via the kine/etcd watch (`/aisix/<env>/observability_exporters/<id>`), so the DP already has a CP→DP command path it watches.

## Proposed design

1. **Passive health reporting (the core).** Extend the heartbeat payload with a per-exporter block derived from `SinkStatsSnapshot`:
   ```json
   "exporter_health": [
     { "exporter_id": "…", "healthy": true, "sent": 1234, "dropped": 0,
       "failed_batches": 0, "last_error": null, "last_success_at": "…" }
   ]
   ```
   Health is derived from delivery outcomes (e.g. unhealthy when the most recent batch failed permanently or N consecutive batches failed) — no extra network traffic against the customer's target.
2. **On-demand probe ("Send test event").** CP writes a probe request under a kine prefix the DP already watches (e.g. `/aisix/<env>/observability_probes/<probe_id>` carrying `exporter_id`); the DP executes ONE synthetic delivery through the real sink (resolving `credential_ref` locally as usual) and reports `{probe_id, ok, error}` in the next heartbeat. Probe records are short-lived (CP deletes after terminal state).
3. Keep `healthcheck()` stubs as-is or implement them via the probe path — a separate always-on prober is NOT needed once 1 + 2 exist.

## Security constraints (unchanged invariants)

- The CP never connects to the customer's telemetry target; only the DP delivers (probe included).
- No credential material ever leaves the DP: `last_error` stays masked, the probe result carries no request/credential detail.
- The synthetic probe event must contain no end-user prompt/response content.

## Out of scope

- CP-side persistence / dashboard UI (tracked in the AISIX-Cloud counterpart issue, linked below).
- Prometheus / OTLP-metrics egress for the DP itself.

## Acceptance criteria

- [ ] Heartbeat carries `exporter_health` for every configured exporter (all four kinds), with masked `last_error`.
- [ ] A probe record written to kine triggers exactly one synthetic delivery and exactly one result report; records are idempotent per `probe_id`.
- [ ] An exporter with a missing credential env var reports `healthy=false` with an actionable, masked error (e.g. names the missing env var — the var NAME is not a secret).
- [ ] e2e: mock-edge test pins heartbeat payload shape + probe round-trip for at least `object_store` and `datadog`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

aisix-obs: report per-exporter delivery health to the control plane + on-demand probe delivery #583

Problem

What exists today (origin/main)

Proposed design

Security constraints (unchanged invariants)

Out of scope

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

aisix-obs: report per-exporter delivery health to the control plane + on-demand probe delivery #583

Description

Problem

What exists today (origin/main)

Proposed design

Security constraints (unchanged invariants)

Out of scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions