shareAI-lab · Gui-Yue · Apr 1, 2026 · Apr 4, 2026 · Apr 4, 2026
diff --git a/.gitignore b/.gitignore
@@ -32,3 +32,4 @@ tests/.tmp/
 *.log
 *.txt
 .kode/
+.kode-observability-http/
diff --git a/README.md b/README.md
@@ -90,6 +90,48 @@ export OPEN_SANDBOX_ENDPOINT=http://127.0.0.1:8080  # optional
 export OPEN_SANDBOX_IMAGE=ubuntu                     # optional
 ```
 
+## Observability
+
+KODE keeps observability as an SDK-facing capability first:
+
+- runtime metrics via `agent.getMetricsSnapshot()`
+- runtime observations via `agent.getObservationReader()` / `agent.subscribeObservations()`
+- optional OTEL bridge via `observability.otel`
+- optional persisted observation query via `observability.persistence`
+
+Minimal persisted-observation example:
+
+```typescript
+import {
+  Agent,
+  JSONStore,
+  JSONStoreObservationBackend,
+  createStoreBackedObservationReader,
+} from '@shareai-lab/kode-sdk';
+
+const storeDir = './.kode';
+const observationBackend = new JSONStoreObservationBackend(storeDir);
+
+const agent = await Agent.create({
+  templateId: 'assistant',
+  observability: {
+    persistence: {
+      backend: observationBackend,
+    },
+  },
+}, deps);
+
+const runtimeSnapshot = agent.getMetricsSnapshot();
+const runtimeObservations = agent.getObservationReader().listObservations();
+
+const persistedReader = createStoreBackedObservationReader(observationBackend);
+const persistedObservations = await persistedReader.listObservations({ limit: 50 });
+```
+
+If you want to expose these metrics or observations over HTTP, do it in your application on top of readers/backends, not inside `Agent` itself. `examples/08-observability-http.ts` is an application-layer example, not an SDK-owned HTTP feature.
+
+Run the full example locally with `npm run example:observability-http`.
+
 ## Architecture for Scale
 
 For production deployments serving many users, we recommend the **Worker Microservice Pattern**:
@@ -150,6 +192,7 @@ See [docs/en/guides/architecture.md](./docs/en/guides/architecture.md) for detai
 | [Concepts](./docs/en/getting-started/concepts.md) | Core concepts explained |
 | **Guides** | |
 | [Events](./docs/en/guides/events.md) | Three-channel event system |
+| [Observability](./docs/en/guides/observability.md) | Metrics, observations, persistence, and app-layer exposure |
 | [Tools](./docs/en/guides/tools.md) | Built-in tools & custom tools |
 | [E2B Sandbox](./docs/en/guides/e2b-sandbox.md) | E2B cloud sandbox integration |
 | [OpenSandbox](./docs/en/guides/opensandbox-sandbox.md) | OpenSandbox self-hosted sandbox integration |

diff --git a/README.zh-CN.md b/README.zh-CN.md
@@ -90,6 +90,48 @@ export OPEN_SANDBOX_ENDPOINT=http://127.0.0.1:8080  # 可选
 export OPEN_SANDBOX_IMAGE=ubuntu                     # 可选
 ```
 
+## 可观测性
+
+KODE 把可观测性优先作为 SDK 能力暴露：
+
+- 运行时指标：`agent.getMetricsSnapshot()`
+- 运行时 observation：`agent.getObservationReader()` / `agent.subscribeObservations()`
+- 可选 OTEL bridge：`observability.otel`
+- 可选持久化 observation 查询：`observability.persistence`
+
+最小持久化 observation 示例：
+
+```typescript
+import {
+  Agent,
+  JSONStore,
+  JSONStoreObservationBackend,
+  createStoreBackedObservationReader,
+} from '@shareai-lab/kode-sdk';
+
+const storeDir = './.kode';
+const observationBackend = new JSONStoreObservationBackend(storeDir);
+
+const agent = await Agent.create({
+  templateId: 'assistant',
+  observability: {
+    persistence: {
+      backend: observationBackend,
+    },
+  },
+}, deps);
+
+const runtimeSnapshot = agent.getMetricsSnapshot();
+const runtimeObservations = agent.getObservationReader().listObservations();
+
+const persistedReader = createStoreBackedObservationReader(observationBackend);
+const persistedObservations = await persistedReader.listObservations({ limit: 50 });
+```
+
+如果你要通过 HTTP 对外暴露这些指标或 observation，应该在你的应用层基于 reader/backend 去包装，而不是让 `Agent` 自己直接监听端口。`examples/08-observability-http.ts` 只是应用层示例，不是 SDK 自带的 HTTP 能力。
+
+可通过 `npm run example:observability-http` 本地运行完整示例。
+
 ## 支持的 Provider
 
 | Provider | 流式输出 | 工具调用 | 推理 | 文件 |
@@ -110,6 +152,7 @@ export OPEN_SANDBOX_IMAGE=ubuntu                     # 可选
 | [核心概念](./docs/zh-CN/getting-started/concepts.md) | 核心概念详解 |
 | **使用指南** | |
 | [事件系统](./docs/zh-CN/guides/events.md) | 三通道事件系统 |
+| [可观测性](./docs/zh-CN/guides/observability.md) | 指标、observation、持久化与应用层暴露 |
 | [工具系统](./docs/zh-CN/guides/tools.md) | 内置工具与自定义工具 |
 | [E2B 沙箱](./docs/zh-CN/guides/e2b-sandbox.md) | E2B 云端沙箱接入 |
 | [OpenSandbox 沙箱](./docs/zh-CN/guides/opensandbox-sandbox.md) | OpenSandbox 自托管沙箱接入 |

diff --git a/docs/en/examples/playbooks.md b/docs/en/examples/playbooks.md
@@ -153,7 +153,33 @@ const stats = await store.aggregateStats(agent.agentId);
 
 ---
 
-## 6. Combined: Approval + Collaboration + Scheduling
+## 6. Observability Readers + Application HTTP Wrapper
+
+- **Goal**: Read runtime/persisted observations from the SDK and optionally expose them through your own app-layer HTTP service.
+- **Example**: `examples/08-observability-http.ts`
+- **Run**: `npm run example:observability-http`
+- **Key Steps**:
+  1. Read point-in-time metrics with `agent.getMetricsSnapshot()`.
+  2. Read live in-memory observations with `agent.getObservationReader()` or `agent.subscribeObservations()`.
+  3. Configure `observability.persistence.backend` and query history with `createStoreBackedObservationReader(...)`.
+  4. Map your own routes, auth, tenant checks, and response shaping in application code.
+- **Considerations**:
+  - Prefer runtime reader for "what is happening now" and persisted reader for audit/history views.
+  - Treat `metadata.__debug` as internal/debug-only data; do not expose it blindly to external consumers.
+  - Keep HTTP, auth, rate limiting, and dashboard concerns outside SDK core.
+
+```typescript
+const metrics = agent.getMetricsSnapshot();
+const runtimeReader = agent.getObservationReader();
+const persistedReader = createStoreBackedObservationReader(observationBackend);
+
+const runtime = runtimeReader.listObservations({ limit: 20 });
+const persisted = await persistedReader.listObservations({ agentIds: [agent.agentId], limit: 50 });
+```
+
+---
+
+## 7. Combined: Approval + Collaboration + Scheduling
 
 - **Scenario**: Code review bot, Planner splits tasks and assigns to Specialists, tool operations need approval, scheduled reminders ensure SLA.
 - **Implementation**:
@@ -184,12 +210,13 @@ const stats = await store.aggregateStats(agent.agentId);
 
 - [Getting Started](../getting-started/quickstart.md)
 - [Events Guide](../guides/events.md)
+- [Observability Guide](../guides/observability.md)
 - [Multi-Agent Systems](../advanced/multi-agent.md)
 - [Database Guide](../guides/database.md)
 
 ---
 
-## 7. CLI Agent Application
+## 8. CLI Agent Application
 
 Build command-line AI assistants like Claude Code or Cursor.
 

diff --git a/docs/en/guides/observability.md b/docs/en/guides/observability.md
@@ -0,0 +1,166 @@
+# Observability Guide
+
+KODE exposes observability as SDK capabilities first, not as an application server.
+
+That means the SDK gives you structured metrics, observations, persistence hooks, and OTEL bridging. Your application decides whether to expose them through HTTP, dashboards, alerting, or internal admin tools.
+
+---
+
+## What KODE Includes
+
+- Runtime metrics via `agent.getMetricsSnapshot()`
+- Runtime observation reads via `agent.getObservationReader()`
+- Runtime observation streaming via `agent.subscribeObservations()`
+- Optional persisted observation queries via `observability.persistence`
+- Optional OTEL export via `observability.otel`
+
+## What KODE Deliberately Does Not Include
+
+- Built-in HTTP server lifecycle
+- Built-in auth, tenant isolation, or rate limiting
+- Built-in observability dashboard UI
+- Opinionated public API contracts for app delivery
+
+Those concerns belong in your application layer.
+
+---
+
+## Runtime Metrics and Observations
+
+Use runtime readers when you want to inspect the current agent process without waiting for external exports.
+
+```typescript
+const metrics = agent.getMetricsSnapshot();
+const reader = agent.getObservationReader();
+
+const latest = reader.listObservations({
+  kinds: ['generation', 'tool'],
+  limit: 20,
+});
+
+for await (const envelope of agent.subscribeObservations({ runId: metrics.currentRunId })) {
+  console.log(envelope.observation.kind, envelope.observation.name);
+}
+```
+
+Typical runtime uses:
+
+- show "live now" generation/tool activity in an admin panel
+- inspect approval waits, tool errors, and compression events
+- derive counters without polling raw event buses
+
+---
+
+## Persisted Observations
+
+Use persisted readers when you need history, audit views, or process-restart durability.
+
+```typescript
+import {
+  Agent,
+  JSONStoreObservationBackend,
+  createStoreBackedObservationReader,
+} from '@shareai-lab/kode-sdk';
+
+const observationBackend = new JSONStoreObservationBackend('./.kode-observability');
+
+const agent = await Agent.create({
+  templateId: 'assistant',
+  observability: {
+    persistence: {
+      backend: observationBackend,
+    },
+  },
+}, deps);
+
+const persistedReader = createStoreBackedObservationReader(observationBackend);
+const history = await persistedReader.listObservations({
+  agentIds: [agent.agentId],
+  kinds: ['agent_run', 'generation', 'tool'],
+  limit: 50,
+});
+```
+
+Use persisted storage for:
+
+- audit timelines
+- run replay pages
+- offline analytics jobs
+- debugging after process restart
+
+---
+
+## OTEL Bridge
+
+If your platform already standardizes on OpenTelemetry, enable the bridge and ship translated spans to your collector.
+
+```typescript
+const agent = await Agent.create({
+  templateId: 'assistant',
+  observability: {
+    otel: {
+      enabled: true,
+      serviceName: 'kode-agent',
+      exporter: {
+        protocol: 'http/json',
+        endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT!,
+      },
+    },
+  },
+}, deps);
+```
+
+Keep KODE's native observation model as your source of truth. OTEL is best treated as an interoperability/export path.
+
+---
+
+## Data Safety and Capture Boundaries
+
+KODE supports configurable capture levels through `observability.capture`:
+
+- `off`
+- `summary`
+- `full`
+- `redacted`
+
+Prefer `summary` or `redacted` for production unless you have a clear compliance reason to store more detail.
+
+Also note:
+
+- provider-specific raw payloads are not part of the public observation schema
+- debug-only extensions may appear under `metadata.__debug`
+- `metadata.__debug` should be treated as internal/private and filtered before external exposure
+
+This keeps the public observation model safer and more stable.
+
+---
+
+## Exposing Observability over HTTP
+
+If you need HTTP endpoints, build them in your app on top of the SDK readers/backends.
+
+Reference example:
+
+- `examples/08-observability-http.ts`
+- run with `npm run example:observability-http`
+
+That example demonstrates:
+
+- a normal app-owned HTTP server
+- `POST /agents/demo/send` to drive an agent run
+- `GET /api/observability/.../metrics` for runtime metrics
+- `GET /api/observability/.../observations/runtime` for live observation reads
+- `GET /api/observability/.../observations/persisted` for persisted history
+
+This boundary is intentional: the SDK provides observability primitives, while the app owns transport, auth, and presentation.
+
+---
+
+## Recommended Rollout
+
+1. Start with runtime metrics and runtime observation readers.
+2. Add persisted observation storage for auditability.
+3. Add OTEL export only if your platform needs centralized telemetry.
+4. Add app-layer HTTP or UI only after the data model and filtering policy are clear.
+
+This order keeps the SDK integration stable and avoids prematurely coupling KODE to one delivery surface.
diff --git a/docs/zh-CN/examples/playbooks.md b/docs/zh-CN/examples/playbooks.md
@@ -153,7 +153,33 @@ const stats = await store.aggregateStats(agent.agentId);
 
 ---
 
-## 6. 组合拳：审批 + 协作 + 调度
+## 6. 观测层读取与应用层 HTTP 包装
+
+- **目标**：从 SDK 读取运行时/持久化 observation，并按你自己的应用边界选择是否通过 HTTP 暴露出去。
+- **示例**：`examples/08-observability-http.ts`
+- **运行**：`npm run example:observability-http`
+- **关键步骤**：
+  1. 通过 `agent.getMetricsSnapshot()` 读取当前指标快照。
+  2. 通过 `agent.getObservationReader()` 或 `agent.subscribeObservations()` 读取运行时 observation。
+  3. 为 `observability.persistence.backend` 配置后端，并用 `createStoreBackedObservationReader(...)` 查询历史数据。
+  4. 在应用代码中自行定义路由、鉴权、租户隔离和响应裁剪。
+- **注意事项**：
+  - 运行时 reader 更适合“现在发生了什么”，持久化 reader 更适合审计与历史视图。
+  - `metadata.__debug` 只能视为内部调试数据，不应直接原样对外暴露。
+  - HTTP、鉴权、限流、Dashboard 都应留在 SDK 外部。
+
+```typescript
+const metrics = agent.getMetricsSnapshot();
+const runtimeReader = agent.getObservationReader();
+const persistedReader = createStoreBackedObservationReader(observationBackend);
+
+const runtime = runtimeReader.listObservations({ limit: 20 });
+const persisted = await persistedReader.listObservations({ agentIds: [agent.agentId], limit: 50 });
+```
+
+---
+
+## 7. 组合拳：审批 + 协作 + 调度
 
 - **场景**：代码审查机器人，Planner 负责拆分任务并分配到不同 Specialist，工具操作需审批，定时提醒确保 SLA。
 - **实现路径**：
@@ -184,5 +210,6 @@ const stats = await store.aggregateStats(agent.agentId);
 
 - [快速开始](../getting-started/quickstart.md)
 - [事件指南](../guides/events.md)
+- [可观测性指南](../guides/observability.md)
 - [多 Agent 系统](../advanced/multi-agent.md)
 - [数据库指南](../guides/database.md)
-Original file line number
+Diff line change
@@ Expand Up / @@ -32,3 +32,4 @@ tests/.tmp/ @@
     *.log
     *.txt
     .kode/
+    .kode-observability-http/