|
| 1 | +# OpenTelemetry Rust Logs Design |
| 2 | + |
| 3 | +Status: |
| 4 | +[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md) |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +[OpenTelemetry (OTel) |
| 9 | +Logs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/README.md) |
| 10 | +support differs from Metrics and Traces as it does not introduce a new logging |
| 11 | +API for end users. Instead, OTel recommends leveraging existing logging |
| 12 | +libraries such as [log](https://crates.io/crates/log) and |
| 13 | +[tracing](https://crates.io/crates/tracing), while providing bridges (appenders) |
| 14 | +to route logs through OpenTelemetry. |
| 15 | + |
| 16 | +OTel took this different approach due to the long history of existing logging |
| 17 | +solutions. In Rust, these are [log](https://crates.io/crates/log) and |
| 18 | +[tracing](https://crates.io/crates/tracing), and have been embraced in the |
| 19 | +community for some time. OTel Rust maintains appenders for these libraries, |
| 20 | +allowing users to seamlessly integrate with OpenTelemetry without changing their |
| 21 | +existing logging instrumentation. |
| 22 | + |
| 23 | +The `tracing` appender is particularly optimized for performance due to its |
| 24 | +widespread adoption and the fact that `tracing` itself has a bridge from the |
| 25 | +`log` crate. Notably, OpenTelemetry Rust itself is instrumented using `tracing` |
| 26 | +for internal logs. Additionally, when OTel began supporting logging as a signal, |
| 27 | +the `log` crate lacked structured logging support, reinforcing the decision to |
| 28 | +prioritize `tracing`. |
| 29 | + |
| 30 | +## Benefits of OpenTelemetry Logs |
| 31 | + |
| 32 | +- **Unified configuration** across Traces, Metrics, and Logs. |
| 33 | +- **Automatic correlation** with Traces. |
| 34 | +- **Consistent Resource attributes** across signals. |
| 35 | +- **Multiple destinations support**: Logs can continue flowing to existing |
| 36 | + destinations like stdout etc. while also being sent to an |
| 37 | + OpenTelemetry-capable backend, typically via an OTLP Exporter or exporters |
| 38 | + that export to operating system native systems like `Windows ETW` or `Linux |
| 39 | + user_events`. |
| 40 | +- **Standalone logging support** for applications that use OpenTelemetry as |
| 41 | + their primary logging mechanism. |
| 42 | + |
| 43 | +## Key Design Principles |
| 44 | + |
| 45 | +- High performance - no locks/contention in the hot path with minimal/no heap |
| 46 | + allocation where possible. |
| 47 | +- Capped resource (memory) usage - well-defined behavior when overloaded. |
| 48 | +- Self-observable - exposes telemetry about itself to aid in troubleshooting |
| 49 | + etc. |
| 50 | +- Robust error handling, returning Result where possible instead of panicking. |
| 51 | +- Minimal public API, exposing based on need only. |
| 52 | + |
| 53 | +## Architecture Overview |
| 54 | + |
| 55 | +```mermaid |
| 56 | +graph TD |
| 57 | + subgraph Application |
| 58 | + A1[Application Code] |
| 59 | + end |
| 60 | + subgraph Logging Libraries |
| 61 | + B1[log crate] |
| 62 | + B2[tracing crate] |
| 63 | + end |
| 64 | + subgraph OpenTelemetry |
| 65 | + C1[OpenTelemetry Appender for log] |
| 66 | + C2[OpenTelemetry Appender for tracing] |
| 67 | + C3[OpenTelemetry Logs API] |
| 68 | + C4[OpenTelemetry Logs SDK] |
| 69 | + C5[OTLP Exporter] |
| 70 | + end |
| 71 | + subgraph Observability Backend |
| 72 | + D1[OTLP-Compatible Backend] |
| 73 | + end |
| 74 | + A1 --> |Emits Logs| B1 |
| 75 | + A1 --> |Emits Logs| B2 |
| 76 | + B1 --> |Bridged by| C1 |
| 77 | + B2 --> |Bridged by| C2 |
| 78 | + C1 --> |Sends to| C3 |
| 79 | + C2 --> |Sends to| C3 |
| 80 | + C3 --> |Processes with| C4 |
| 81 | + C4 --> |Exports via| C5 |
| 82 | + C5 --> |Sends to| D1 |
| 83 | +``` |
| 84 | + |
| 85 | +## Logs API |
| 86 | + |
| 87 | +Logs API is part of the [opentelemetry](https://crates.io/crates/opentelemetry) |
| 88 | +crate. |
| 89 | + |
| 90 | +The OTel Logs API is not intended for direct end-user usage. Instead, it is |
| 91 | +designed for appender/bridge authors to integrate existing logging libraries |
| 92 | +with OpenTelemetry. However, there is nothing preventing it from being used by |
| 93 | +end-users. |
| 94 | + |
| 95 | +### API Components |
| 96 | + |
| 97 | +1. **Key-Value Structs**: Used in `LogRecord`, where `Key` struct is shared |
| 98 | + across signals but `Value` struct differ from Metrics and Traces. This is |
| 99 | + because values in Logs can contain more complex structures than those in |
| 100 | + Traces and Metrics. |
| 101 | +2. **Traits**: |
| 102 | + - `LoggerProvider` - provides methods to obtain Logger. |
| 103 | + - `Logger` - provides methods to create LogRecord and emit the created |
| 104 | + LogRecord. |
| 105 | + - `LogRecord` - provides methods to populate LogRecord. |
| 106 | +3. **No-Op Implementations**: By default, the API performs no operations until |
| 107 | + an SDK is attached. |
| 108 | + |
| 109 | +### Logs Flow |
| 110 | + |
| 111 | +1. Obtain a `LoggerProvider` implementation. |
| 112 | +2. Use the `LoggerProvider` to create `Logger` instances, specifying a scope |
| 113 | + name (module/component emitting logs). Optional attributes and version are |
| 114 | + also supported. |
| 115 | +3. Use the `Logger` to create an empty `LogRecord` instance. |
| 116 | +4. Populate the `LogRecord` with body, timestamp, attributes, etc. |
| 117 | +5. Call `Logger.emit(LogRecord)` to process and export the log. |
| 118 | + |
| 119 | +If only the Logs API is used (without an SDK), all the above steps result in no |
| 120 | +operations, following OpenTelemetry’s philosophy of separating API from SDK. The |
| 121 | +official Logs SDK provides real implementations to process and export logs. |
| 122 | +Users or vendors can also provide alternative SDK implementations. |
| 123 | + |
| 124 | +## Logs SDK |
| 125 | + |
| 126 | +Logs SDK is part of the |
| 127 | +[opentelemetry_sdk](https://crates.io/crates/opentelemetry_sdk) crate. |
| 128 | + |
| 129 | +The OpenTelemetry Logs SDK provides an OTel specification-compliant |
| 130 | +implementation of the Logs API, handling log processing and export. |
| 131 | + |
| 132 | +### Core Components |
| 133 | + |
| 134 | +#### `SdkLoggerProvider` |
| 135 | + |
| 136 | +This is the implementation of the `LoggerProvider` and deals with concerns such |
| 137 | +as processing and exporting Logs. |
| 138 | + |
| 139 | +- Implements the `LoggerProvider` trait. |
| 140 | +- Creates and manages `SdkLogger` instances. |
| 141 | +- Holds logging configuration, including `Resource` and processors. |
| 142 | +- Does not retain a list of created loggers. Instead, it passes an owned clone |
| 143 | + of itself to each logger created. This is done so that loggers get a hold of |
| 144 | + the configuration (like which processor to invoke). |
| 145 | +- Uses an `Arc<LoggerProviderInner>` and delegates all configuration to |
| 146 | + `LoggerProviderInner`. This allows cheap cloning of itself and ensures all |
| 147 | + clones point to the same underlying configuration. |
| 148 | +- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only take |
| 149 | + `&self` in its methods like flush and shutdown. Else it needs to rely on |
| 150 | + interior mutability that comes with runtime performance costs. Since methods |
| 151 | + like shutdown usually need to mutate interior state, but this component can |
| 152 | + only take `&self`, it defers to components like exporter to use interior |
| 153 | + mutability to handle shutdown. (More on this in the exporter section) |
| 154 | +- An alternative design was to let `SdkLogger` hold a `Weak` reference to the |
| 155 | + `SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log |
| 156 | + emission, significantly affecting throughput. |
| 157 | +- `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no |
| 158 | + references remain. However, in practice, loggers are often stored statically |
| 159 | + inside appenders (like tracing-appender), so explicit shutdown by the user is |
| 160 | + required. |
| 161 | + |
| 162 | +#### `SdkLogger` |
| 163 | + |
| 164 | +This is an implementation of the `Logger`, and contains functionality to create |
| 165 | +and emit logs. |
| 166 | + |
| 167 | +- Implements the `Logger` trait. |
| 168 | +- Creates `SdkLogRecord` instances and emits them. |
| 169 | +- Calls `OnEmit()` on all registered processors when emitting logs. |
| 170 | +- Passes mutable references to each processor (`&mut log_record`), i.e., |
| 171 | + ownership is not passed to the processor. This ensures that the logger avoids |
| 172 | + cloning costs. Since a mutable reference is passed, processors can modify the |
| 173 | + log, and it will be visible to the next processor in the chain. |
| 174 | +- Since the processor only gets a reference to the log, it cannot store it |
| 175 | + beyond the `OnEmit()`. If a processor needs to buffer logs, it must explicitly |
| 176 | + copy them to the heap. |
| 177 | +- This design allows for stack-only log processing when exporting to operating |
| 178 | + system native facilities like `Windows ETW` or `Linux user_events`. |
| 179 | +- OTLP Exporting requires network calls (HTTP/gRPC) and batching of logs for |
| 180 | + efficiency purposes. These exporters buffer log records by copying them to the |
| 181 | + heap. (More on this in the BatchLogRecordProcessor section) |
| 182 | + |
| 183 | +#### `LogRecord` |
| 184 | + |
| 185 | +- Holds log data, including attributes. |
| 186 | +- Uses an inline array for up to 5 attributes to optimize stack usage. |
| 187 | +- Falls back to a heap-allocated `Vec` if more attributes are required. |
| 188 | +- Inspired by Go’s `slog` library for efficiency. |
| 189 | + |
| 190 | +#### LogRecord Processors |
| 191 | + |
| 192 | +`SdkLoggerProvider` allows being configured with any number of LogProcessors. |
| 193 | +They get called in the order of registration. Log records are passed to the |
| 194 | +`OnEmit` method of LogProcessor. LogProcessors can be used to process the log |
| 195 | +records, enrich them, filter them, and export to destinations by leveraging |
| 196 | +LogRecord Exporters. |
| 197 | + |
| 198 | +Following built-in Log processors are provided in the Log SDK: |
| 199 | + |
| 200 | +##### SimpleLogProcessor |
| 201 | + |
| 202 | +This processor is designed to be used for exporting purposes. Export is handled |
| 203 | +by an Exporter (which is a separate component). SimpleLogProcessor is "simple" |
| 204 | +in the sense that it does not attempt to do any processing - it just calls the |
| 205 | +exporter and passes the log record to it. To comply with OTel specification, it |
| 206 | +synchronizes calls to the `Export()` method, i.e., only one `Export()` call will |
| 207 | +be done at any given time. |
| 208 | + |
| 209 | +SimpleLogProcessor is only used for test/learning purposes and is often used |
| 210 | +along with a `stdout` exporter. |
| 211 | + |
| 212 | +##### BatchLogProcessor |
| 213 | + |
| 214 | +This is another "exporting" processor. As with SimpleLogProcessor, a different |
| 215 | +component named LogExporter handles the actual export logic. BatchLogProcessor |
| 216 | +buffers/batches the logs it receives into an in-memory buffer. It invokes the |
| 217 | +exporter every 1 second or when 512 items are in the batch (customizable). It |
| 218 | +uses a background thread to do the export, and communication between the user |
| 219 | +thread (where logs are emitted) and the background thread occurs with `mpsc` |
| 220 | +channels. |
| 221 | + |
| 222 | +The max amount of items the buffer holds is 2048 (customizable). Once the limit |
| 223 | +is reached, any *new* logs are dropped. It *does not* apply back-pressure to the |
| 224 | +user thread and instead drops logs. |
| 225 | + |
| 226 | +As with SimpleLogProcessor, this component also ensures only one export is |
| 227 | +active at a given time. A modified version of this is required to achieve higher |
| 228 | +throughput in some environments. |
| 229 | + |
| 230 | +In this design, at most 2048+512 logs can be in memory at any given point. In |
| 231 | +other words, that many logs can be lost if the app crashes in the middle. |
| 232 | + |
| 233 | +## LogExporters |
| 234 | + |
| 235 | +LogExporters are responsible for exporting logs to a destination. Some of them |
| 236 | +include: |
| 237 | + |
| 238 | +1. **InMemoryExporter** - exports to an in-memory list, primarily for |
| 239 | + unit-testing. This is used extensively in the repo itself, and external users |
| 240 | + are also encouraged to use this. |
| 241 | +2. **Stdout exporter** - prints telemetry to stdout. Only for debugging/learning |
| 242 | + purposes. The output format is not defined and also is not performance |
| 243 | + optimized. A production-recommended version with a standardized output format |
| 244 | + is in the plan. |
| 245 | +3. **OTLP Exporter** - OTel's official exporter which uses the OTLP protocol |
| 246 | + that is designed with the OTel data model in mind. Both HTTP and gRPC-based |
| 247 | + exporting is offered. |
| 248 | +4. **Exporters to OS Kernel facilities** - These exporters are not maintained in |
| 249 | + the core repo but listed for completion. They export telemetry to Windows ETW |
| 250 | + or Linux user_events. They are designed for high-performance workloads. Due |
| 251 | + to their nature of synchronous exporting, they do not require |
| 252 | + buffering/batching. This allows logs to operate entirely on the stack and can |
| 253 | + scale easily with the number of CPU cores. (Kernel uses per-CPU buffers for |
| 254 | + the events, ensuring no contention) |
| 255 | + |
| 256 | +## `tracing` Log Appender |
| 257 | + |
| 258 | +Tracing appender is part of the |
| 259 | +[opentelemetry-appender-tracing](https://crates.io/crates/opentelemetry-appender-tracing) |
| 260 | +crate. |
| 261 | + |
| 262 | +The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via |
| 263 | +`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through |
| 264 | +this integration. |
| 265 | + |
| 266 | +- `tracing` is designed for high performance, using *layers* or *subscribers* to |
| 267 | + handle emitted logs (events). |
| 268 | +- The appender implements a `Layer`, receiving logs from `tracing`. |
| 269 | +- Uses the OTel Logs API to create `LogRecord`, populate it, and emit it via |
| 270 | + `Logger.emit(LogRecord)`. |
| 271 | +- If no Logs SDK is present, the process is a no-op. |
| 272 | + |
| 273 | +Note on terminology: Within OpenTelemetry, "tracing" refers to distributed |
| 274 | +tracing (i.e creation of Spans) and not in-process structured logging and |
| 275 | +execution traces. The crate "tracing" has notion of creating Spans as well as |
| 276 | +Events. The events from "tracing" crate is what gets converted to OTel Logs, |
| 277 | +when using this appender. Spans created using "tracing" crate is not handled by |
| 278 | +this crate. |
| 279 | + |
| 280 | +## Performance |
| 281 | + |
| 282 | +// Call out things done specifically for performance |
| 283 | + |
| 284 | +### Perf test - benchmarks |
| 285 | + |
| 286 | +// Share ~~ numbers |
| 287 | + |
| 288 | +### Perf test - stress test |
| 289 | + |
| 290 | +// Share ~~ numbers |
| 291 | + |
| 292 | +## Summary |
| 293 | + |
| 294 | +- OpenTelemetry Logs does not provide a user-facing logging API. |
| 295 | +- Instead, it integrates with existing logging libraries (`log`, `tracing`). |
| 296 | +- The Logs API defines key traits but performs no operations unless an SDK is |
| 297 | + installed. |
| 298 | +- The Logs SDK enables log processing, transformation, and export. |
| 299 | +- The Logs SDK is performance optimized to minimize copying and heap allocation, |
| 300 | + wherever feasible. |
| 301 | +- The `tracing` appender efficiently routes logs to OpenTelemetry without |
| 302 | + modifying existing logging workflows. |
0 commit comments