Skip to content

Commit ac66848

Browse files
cijothomaslalitb
andauthored
Add design docs (#2657)
Co-authored-by: Lalit Kumar Bhasin <[email protected]>
1 parent 1aca212 commit ac66848

File tree

3 files changed

+314
-0
lines changed

3 files changed

+314
-0
lines changed

docs/design/logs.md

+302
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# OpenTelemetry Rust Logs Design
2+
3+
Status:
4+
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)
5+
6+
## Overview
7+
8+
[OpenTelemetry (OTel)
9+
Logs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/README.md)
10+
support differs from Metrics and Traces as it does not introduce a new logging
11+
API for end users. Instead, OTel recommends leveraging existing logging
12+
libraries such as [log](https://crates.io/crates/log) and
13+
[tracing](https://crates.io/crates/tracing), while providing bridges (appenders)
14+
to route logs through OpenTelemetry.
15+
16+
OTel took this different approach due to the long history of existing logging
17+
solutions. In Rust, these are [log](https://crates.io/crates/log) and
18+
[tracing](https://crates.io/crates/tracing), and have been embraced in the
19+
community for some time. OTel Rust maintains appenders for these libraries,
20+
allowing users to seamlessly integrate with OpenTelemetry without changing their
21+
existing logging instrumentation.
22+
23+
The `tracing` appender is particularly optimized for performance due to its
24+
widespread adoption and the fact that `tracing` itself has a bridge from the
25+
`log` crate. Notably, OpenTelemetry Rust itself is instrumented using `tracing`
26+
for internal logs. Additionally, when OTel began supporting logging as a signal,
27+
the `log` crate lacked structured logging support, reinforcing the decision to
28+
prioritize `tracing`.
29+
30+
## Benefits of OpenTelemetry Logs
31+
32+
- **Unified configuration** across Traces, Metrics, and Logs.
33+
- **Automatic correlation** with Traces.
34+
- **Consistent Resource attributes** across signals.
35+
- **Multiple destinations support**: Logs can continue flowing to existing
36+
destinations like stdout etc. while also being sent to an
37+
OpenTelemetry-capable backend, typically via an OTLP Exporter or exporters
38+
that export to operating system native systems like `Windows ETW` or `Linux
39+
user_events`.
40+
- **Standalone logging support** for applications that use OpenTelemetry as
41+
their primary logging mechanism.
42+
43+
## Key Design Principles
44+
45+
- High performance - no locks/contention in the hot path with minimal/no heap
46+
allocation where possible.
47+
- Capped resource (memory) usage - well-defined behavior when overloaded.
48+
- Self-observable - exposes telemetry about itself to aid in troubleshooting
49+
etc.
50+
- Robust error handling, returning Result where possible instead of panicking.
51+
- Minimal public API, exposing based on need only.
52+
53+
## Architecture Overview
54+
55+
```mermaid
56+
graph TD
57+
subgraph Application
58+
A1[Application Code]
59+
end
60+
subgraph Logging Libraries
61+
B1[log crate]
62+
B2[tracing crate]
63+
end
64+
subgraph OpenTelemetry
65+
C1[OpenTelemetry Appender for log]
66+
C2[OpenTelemetry Appender for tracing]
67+
C3[OpenTelemetry Logs API]
68+
C4[OpenTelemetry Logs SDK]
69+
C5[OTLP Exporter]
70+
end
71+
subgraph Observability Backend
72+
D1[OTLP-Compatible Backend]
73+
end
74+
A1 --> |Emits Logs| B1
75+
A1 --> |Emits Logs| B2
76+
B1 --> |Bridged by| C1
77+
B2 --> |Bridged by| C2
78+
C1 --> |Sends to| C3
79+
C2 --> |Sends to| C3
80+
C3 --> |Processes with| C4
81+
C4 --> |Exports via| C5
82+
C5 --> |Sends to| D1
83+
```
84+
85+
## Logs API
86+
87+
Logs API is part of the [opentelemetry](https://crates.io/crates/opentelemetry)
88+
crate.
89+
90+
The OTel Logs API is not intended for direct end-user usage. Instead, it is
91+
designed for appender/bridge authors to integrate existing logging libraries
92+
with OpenTelemetry. However, there is nothing preventing it from being used by
93+
end-users.
94+
95+
### API Components
96+
97+
1. **Key-Value Structs**: Used in `LogRecord`, where `Key` struct is shared
98+
across signals but `Value` struct differ from Metrics and Traces. This is
99+
because values in Logs can contain more complex structures than those in
100+
Traces and Metrics.
101+
2. **Traits**:
102+
- `LoggerProvider` - provides methods to obtain Logger.
103+
- `Logger` - provides methods to create LogRecord and emit the created
104+
LogRecord.
105+
- `LogRecord` - provides methods to populate LogRecord.
106+
3. **No-Op Implementations**: By default, the API performs no operations until
107+
an SDK is attached.
108+
109+
### Logs Flow
110+
111+
1. Obtain a `LoggerProvider` implementation.
112+
2. Use the `LoggerProvider` to create `Logger` instances, specifying a scope
113+
name (module/component emitting logs). Optional attributes and version are
114+
also supported.
115+
3. Use the `Logger` to create an empty `LogRecord` instance.
116+
4. Populate the `LogRecord` with body, timestamp, attributes, etc.
117+
5. Call `Logger.emit(LogRecord)` to process and export the log.
118+
119+
If only the Logs API is used (without an SDK), all the above steps result in no
120+
operations, following OpenTelemetry’s philosophy of separating API from SDK. The
121+
official Logs SDK provides real implementations to process and export logs.
122+
Users or vendors can also provide alternative SDK implementations.
123+
124+
## Logs SDK
125+
126+
Logs SDK is part of the
127+
[opentelemetry_sdk](https://crates.io/crates/opentelemetry_sdk) crate.
128+
129+
The OpenTelemetry Logs SDK provides an OTel specification-compliant
130+
implementation of the Logs API, handling log processing and export.
131+
132+
### Core Components
133+
134+
#### `SdkLoggerProvider`
135+
136+
This is the implementation of the `LoggerProvider` and deals with concerns such
137+
as processing and exporting Logs.
138+
139+
- Implements the `LoggerProvider` trait.
140+
- Creates and manages `SdkLogger` instances.
141+
- Holds logging configuration, including `Resource` and processors.
142+
- Does not retain a list of created loggers. Instead, it passes an owned clone
143+
of itself to each logger created. This is done so that loggers get a hold of
144+
the configuration (like which processor to invoke).
145+
- Uses an `Arc<LoggerProviderInner>` and delegates all configuration to
146+
`LoggerProviderInner`. This allows cheap cloning of itself and ensures all
147+
clones point to the same underlying configuration.
148+
- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only take
149+
`&self` in its methods like flush and shutdown. Else it needs to rely on
150+
interior mutability that comes with runtime performance costs. Since methods
151+
like shutdown usually need to mutate interior state, but this component can
152+
only take `&self`, it defers to components like exporter to use interior
153+
mutability to handle shutdown. (More on this in the exporter section)
154+
- An alternative design was to let `SdkLogger` hold a `Weak` reference to the
155+
`SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log
156+
emission, significantly affecting throughput.
157+
- `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no
158+
references remain. However, in practice, loggers are often stored statically
159+
inside appenders (like tracing-appender), so explicit shutdown by the user is
160+
required.
161+
162+
#### `SdkLogger`
163+
164+
This is an implementation of the `Logger`, and contains functionality to create
165+
and emit logs.
166+
167+
- Implements the `Logger` trait.
168+
- Creates `SdkLogRecord` instances and emits them.
169+
- Calls `OnEmit()` on all registered processors when emitting logs.
170+
- Passes mutable references to each processor (`&mut log_record`), i.e.,
171+
ownership is not passed to the processor. This ensures that the logger avoids
172+
cloning costs. Since a mutable reference is passed, processors can modify the
173+
log, and it will be visible to the next processor in the chain.
174+
- Since the processor only gets a reference to the log, it cannot store it
175+
beyond the `OnEmit()`. If a processor needs to buffer logs, it must explicitly
176+
copy them to the heap.
177+
- This design allows for stack-only log processing when exporting to operating
178+
system native facilities like `Windows ETW` or `Linux user_events`.
179+
- OTLP Exporting requires network calls (HTTP/gRPC) and batching of logs for
180+
efficiency purposes. These exporters buffer log records by copying them to the
181+
heap. (More on this in the BatchLogRecordProcessor section)
182+
183+
#### `LogRecord`
184+
185+
- Holds log data, including attributes.
186+
- Uses an inline array for up to 5 attributes to optimize stack usage.
187+
- Falls back to a heap-allocated `Vec` if more attributes are required.
188+
- Inspired by Go’s `slog` library for efficiency.
189+
190+
#### LogRecord Processors
191+
192+
`SdkLoggerProvider` allows being configured with any number of LogProcessors.
193+
They get called in the order of registration. Log records are passed to the
194+
`OnEmit` method of LogProcessor. LogProcessors can be used to process the log
195+
records, enrich them, filter them, and export to destinations by leveraging
196+
LogRecord Exporters.
197+
198+
Following built-in Log processors are provided in the Log SDK:
199+
200+
##### SimpleLogProcessor
201+
202+
This processor is designed to be used for exporting purposes. Export is handled
203+
by an Exporter (which is a separate component). SimpleLogProcessor is "simple"
204+
in the sense that it does not attempt to do any processing - it just calls the
205+
exporter and passes the log record to it. To comply with OTel specification, it
206+
synchronizes calls to the `Export()` method, i.e., only one `Export()` call will
207+
be done at any given time.
208+
209+
SimpleLogProcessor is only used for test/learning purposes and is often used
210+
along with a `stdout` exporter.
211+
212+
##### BatchLogProcessor
213+
214+
This is another "exporting" processor. As with SimpleLogProcessor, a different
215+
component named LogExporter handles the actual export logic. BatchLogProcessor
216+
buffers/batches the logs it receives into an in-memory buffer. It invokes the
217+
exporter every 1 second or when 512 items are in the batch (customizable). It
218+
uses a background thread to do the export, and communication between the user
219+
thread (where logs are emitted) and the background thread occurs with `mpsc`
220+
channels.
221+
222+
The max amount of items the buffer holds is 2048 (customizable). Once the limit
223+
is reached, any *new* logs are dropped. It *does not* apply back-pressure to the
224+
user thread and instead drops logs.
225+
226+
As with SimpleLogProcessor, this component also ensures only one export is
227+
active at a given time. A modified version of this is required to achieve higher
228+
throughput in some environments.
229+
230+
In this design, at most 2048+512 logs can be in memory at any given point. In
231+
other words, that many logs can be lost if the app crashes in the middle.
232+
233+
## LogExporters
234+
235+
LogExporters are responsible for exporting logs to a destination. Some of them
236+
include:
237+
238+
1. **InMemoryExporter** - exports to an in-memory list, primarily for
239+
unit-testing. This is used extensively in the repo itself, and external users
240+
are also encouraged to use this.
241+
2. **Stdout exporter** - prints telemetry to stdout. Only for debugging/learning
242+
purposes. The output format is not defined and also is not performance
243+
optimized. A production-recommended version with a standardized output format
244+
is in the plan.
245+
3. **OTLP Exporter** - OTel's official exporter which uses the OTLP protocol
246+
that is designed with the OTel data model in mind. Both HTTP and gRPC-based
247+
exporting is offered.
248+
4. **Exporters to OS Kernel facilities** - These exporters are not maintained in
249+
the core repo but listed for completion. They export telemetry to Windows ETW
250+
or Linux user_events. They are designed for high-performance workloads. Due
251+
to their nature of synchronous exporting, they do not require
252+
buffering/batching. This allows logs to operate entirely on the stack and can
253+
scale easily with the number of CPU cores. (Kernel uses per-CPU buffers for
254+
the events, ensuring no contention)
255+
256+
## `tracing` Log Appender
257+
258+
Tracing appender is part of the
259+
[opentelemetry-appender-tracing](https://crates.io/crates/opentelemetry-appender-tracing)
260+
crate.
261+
262+
The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via
263+
`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through
264+
this integration.
265+
266+
- `tracing` is designed for high performance, using *layers* or *subscribers* to
267+
handle emitted logs (events).
268+
- The appender implements a `Layer`, receiving logs from `tracing`.
269+
- Uses the OTel Logs API to create `LogRecord`, populate it, and emit it via
270+
`Logger.emit(LogRecord)`.
271+
- If no Logs SDK is present, the process is a no-op.
272+
273+
Note on terminology: Within OpenTelemetry, "tracing" refers to distributed
274+
tracing (i.e creation of Spans) and not in-process structured logging and
275+
execution traces. The crate "tracing" has notion of creating Spans as well as
276+
Events. The events from "tracing" crate is what gets converted to OTel Logs,
277+
when using this appender. Spans created using "tracing" crate is not handled by
278+
this crate.
279+
280+
## Performance
281+
282+
// Call out things done specifically for performance
283+
284+
### Perf test - benchmarks
285+
286+
// Share ~~ numbers
287+
288+
### Perf test - stress test
289+
290+
// Share ~~ numbers
291+
292+
## Summary
293+
294+
- OpenTelemetry Logs does not provide a user-facing logging API.
295+
- Instead, it integrates with existing logging libraries (`log`, `tracing`).
296+
- The Logs API defines key traits but performs no operations unless an SDK is
297+
installed.
298+
- The Logs SDK enables log processing, transformation, and export.
299+
- The Logs SDK is performance optimized to minimize copying and heap allocation,
300+
wherever feasible.
301+
- The `tracing` appender efficiently routes logs to OpenTelemetry without
302+
modifying existing logging workflows.

docs/design/metrics.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# OpenTelemetry Rust Metrics Design
2+
3+
Status:
4+
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)
5+
6+
TODO:

docs/design/traces.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# OpenTelemetry Rust Traces Design
2+
3+
Status:
4+
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)
5+
6+
TODO:

0 commit comments

Comments
 (0)