Forward OpenClaw model-usage diagnostics to Langfuse.
The plugin registers a background service that subscribes to OpenClaw's internal diagnostics bus and reconstructs each turn as a nested Langfuse trace: one root per turn (grouped by the W3C trace id OpenClaw stamps on every event), with a child observation for every model call, tool call, and retrieval step in it. Traces are tagged with the OpenClaw session id so a conversation's turns group together in Langfuse's Sessions view.
Crucially, tool calls and RAG/retrieval steps appear as their own
observations (tool and retriever types) nested under the run — so you can
see the retrieval that fed a generation instead of the context just appearing in
the next prompt out of nowhere.
Built on the Langfuse v5 SDK (OpenTelemetry-based), so traces appear in
Langfuse's new observations-first ("fast") UI. The Langfuse SpanProcessor runs
on a dedicated, isolated OTel TracerProvider (setLangfuseTracerProvider) so
it never touches the global OpenTelemetry state OpenClaw's bundled
diagnostics-otel service owns.
It subscribes via the public onInternalDiagnosticEvent SDK export. (The
ctx.internalDiagnostics.onEvent capability is privileged — the runtime injects
it only for the bundled diagnostics-otel/diagnostics-prometheus services, so
third-party plugins never receive it.) The public listener delivers the
run.*, model.usage, tool.execution.*, context.assembled, and
model.call.error event bodies this bridge maps (minus private message content
— see below).
openclaw plugins install openclaw-x-langfuse-pluginOpenClaw resolves through ClawHub first and falls back to npm. For local
development, install from a path with --link:
openclaw plugins install ./openclaw-x-langfuse-plugin --linkEnable the plugin and provide Langfuse credentials in openclaw.json:
{
"plugins": {
"allow": ["langfuse-bridge"],
"entries": {
"langfuse-bridge": {
"enabled": true,
"config": {
"publicKey": "pk-lf-...",
"secretKey": "sk-lf-...",
"baseUrl": "https://cloud.langfuse.com"
}
}
}
}
}Credentials may also be supplied via environment variables, which take effect when the corresponding config field is absent:
| Config field | Environment fallback | Default |
|---|---|---|
publicKey |
LANGFUSE_PUBLIC_KEY |
— |
secretKey |
LANGFUSE_SECRET_KEY |
— |
baseUrl |
LANGFUSE_BASE_URL |
https://cloud.langfuse.com |
Then restart the gateway:
openclaw gateway restartIf publicKey/secretKey are missing, the service logs a warning and does not
start — it never blocks the gateway.
Each OpenClaw turn becomes one Langfuse trace (keyed by the shared W3C trace id),
named after the channel, with session.id set to the OpenClaw session id so a
conversation's turns group in the Sessions view. Under that root:
- Turn root (
agent) — anchored byrun.started/run.completed, withoutcomeanddurationMs. Its trace-level input/output mirror the turn's prompt and final response. (OpenClaw's per-event span parents are inconsistent —model.usagehangs off the harness span while tools hang off the run span — so children are attached directly to this one root rather than reconstructing that internal chain.) - Generation — built from
model.usage:model,usageDetails(input,output,cache_read,cache_write,total),costDetails.totalCost(USD), timing, plus provider metadata and the turn's prompt/response text as input/output. - Tool / Retriever — one observation per
tool.execution.*, named after the tool. Retrieval/search tools (vector search, RAG, grep, web fetch, memory recall, …) are classified as Langfuseretrieverobservations; everything else is atool. CarriestoolSource,paramsSummary, duration, and — when recoverable — the tool's arguments and result as input/output. - Context —
context.assembledbecomes a short span with message/prompt size metadata. - Errors —
model.call.errorbecomes an ERROR observation with the failure category/kind.
Message content (prompts, responses, tool arguments and results) is not
delivered to third-party plugins — OpenClaw hands it only to bundled diagnostics
services. The bridge recovers it best-effort from the per-session trajectory
transcript
(<stateDir>/agents/<agentId>/sessions/<sessionId>.trajectory.jsonl). If the
transcript is unavailable, observations are still forwarded with empty
input/output; the structure (which step ran, when, how long) is always present.
OpenClaw delivers tool.execution.* and model.call.* events asynchronously
(they're queued and can be dropped under heavy load), while run.* and
model.usage are synchronous — so run.completed reaches the bridge before
its own tool events, and model.usage arrives after it. The engine handles
this by soft-ending the turn root (fixing its duration) while keeping it
resolvable, so late-arriving children still attach to it, and an idle reaper
closes any observation orphaned by a dropped terminal event.
import { onInternalDiagnosticEvent } from "openclaw/plugin-sdk/diagnostic-runtime";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { startObservation, setLangfuseTracerProvider } from "@langfuse/tracing";
import { createTraceEngine } from "./tracer.js";
api.registerService({
id: "langfuse-bridge",
start(ctx) {
// Isolated OTel pipeline -> never touches OpenClaw's global tracer provider.
const provider = new NodeTracerProvider({
spanProcessors: [new LangfuseSpanProcessor({ publicKey, secretKey, baseUrl })],
});
setLangfuseTracerProvider(provider);
// The engine groups observations into one trace per turn, keyed by the W3C
// trace id OpenClaw stamps on every event, and attaches model.usage /
// tool.execution.* / context.assembled as children of that turn root.
const engine = createTraceEngine({ startObservation }, { /* resolvers */ });
const unsubscribe = onInternalDiagnosticEvent((evt) => engine.handle(evt));
setInterval(() => engine.sweep(), 60_000).unref(); // reap orphans
},
});Note: these events are emitted on OpenClaw's reply/delivery path (channel messages, webchat/TUI turns) — not on direct
openclaw agentCLI runs, which use the embedded runner and don't emit them.
MIT