Skip to content

Add design docs #2657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
302 changes: 302 additions & 0 deletions docs/design/logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
# OpenTelemetry Rust Logs Design

Status:
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)

## Overview

[OpenTelemetry (OTel)
Logs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/README.md)
support differs from Metrics and Traces as it does not introduce a new logging
API for end users. Instead, OTel recommends leveraging existing logging
libraries such as [log](https://crates.io/crates/log) and
[tracing](https://crates.io/crates/tracing), while providing bridges (appenders)
to route logs through OpenTelemetry.

OTel took this different approach due to the long history of existing logging
solutions. In Rust, these are [log](https://crates.io/crates/log) and
[tracing](https://crates.io/crates/tracing), and have been embraced in the
community for some time. OTel Rust maintains appenders for these libraries,
allowing users to seamlessly integrate with OpenTelemetry without changing their
existing logging instrumentation.

The `tracing` appender is particularly optimized for performance due to its
widespread adoption and the fact that `tracing` itself has a bridge from the
`log` crate. Notably, OpenTelemetry Rust itself is instrumented using `tracing`
for internal logs. Additionally, when OTel began supporting logging as a signal,
the `log` crate lacked structured logging support, reinforcing the decision to
prioritize `tracing`.

## Benefits of OpenTelemetry Logs

- **Unified configuration** across Traces, Metrics, and Logs.
- **Automatic correlation** with Traces.
- **Consistent Resource attributes** across signals.
- **Multiple destinations support**: Logs can continue flowing to existing
destinations like stdout etc. while also being sent to an
OpenTelemetry-capable backend, typically via an OTLP Exporter or exporters
that export to operating system native systems like `Windows ETW` or `Linux
user_events`.
- **Standalone logging support** for applications that use OpenTelemetry as
their primary logging mechanism.

## Key Design Principles

- High performance - no locks/contention in the hot path with minimal/no heap
allocation where possible.
- Capped resource (memory) usage - well-defined behavior when overloaded.
- Self-observable - exposes telemetry about itself to aid in troubleshooting
etc.
- Robust error handling, returning Result where possible instead of panicking.
- Minimal public API, exposing based on need only.

## Architecture Overview

```mermaid
graph TD
subgraph Application
A1[Application Code]
end
subgraph Logging Libraries
B1[log crate]
B2[tracing crate]
end
subgraph OpenTelemetry
C1[OpenTelemetry Appender for log]
C2[OpenTelemetry Appender for tracing]
C3[OpenTelemetry Logs API]
C4[OpenTelemetry Logs SDK]
C5[OTLP Exporter]
end
subgraph Observability Backend
D1[OTLP-Compatible Backend]
end
A1 --> |Emits Logs| B1
A1 --> |Emits Logs| B2
B1 --> |Bridged by| C1
B2 --> |Bridged by| C2
C1 --> |Sends to| C3
C2 --> |Sends to| C3
C3 --> |Processes with| C4
C4 --> |Exports via| C5
C5 --> |Sends to| D1
```

## Logs API

Logs API is part of the [opentelemetry](https://crates.io/crates/opentelemetry)
crate.

The OTel Logs API is not intended for direct end-user usage. Instead, it is
designed for appender/bridge authors to integrate existing logging libraries
with OpenTelemetry. However, there is nothing preventing it from being used by
end-users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - However, since this is public API, there is nothing preventing it from being used by end-users.


### API Components

1. **Key-Value Structs**: Used in `LogRecord`, where `Key` struct is shared
across signals but `Value` struct differ from Metrics and Traces. This is
because values in Logs can contain more complex structures than those in
Traces and Metrics.
2. **Traits**:
- `LoggerProvider` - provides methods to obtain Logger.
- `Logger` - provides methods to create LogRecord and emit the created
LogRecord.
- `LogRecord` - provides methods to populate LogRecord.
3. **No-Op Implementations**: By default, the API performs no operations until
an SDK is attached.

### Logs Flow

1. Obtain a `LoggerProvider` implementation.
2. Use the `LoggerProvider` to create `Logger` instances, specifying a scope
name (module/component emitting logs). Optional attributes and version are
also supported.
3. Use the `Logger` to create an empty `LogRecord` instance.
4. Populate the `LogRecord` with body, timestamp, attributes, etc.
5. Call `Logger.emit(LogRecord)` to process and export the log.

If only the Logs API is used (without an SDK), all the above steps result in no
operations, following OpenTelemetry’s philosophy of separating API from SDK. The
official Logs SDK provides real implementations to process and export logs.
Users or vendors can also provide alternative SDK implementations.

## Logs SDK

Logs SDK is part of the
[opentelemetry_sdk](https://crates.io/crates/opentelemetry_sdk) crate.

The OpenTelemetry Logs SDK provides an OTel specification-compliant
implementation of the Logs API, handling log processing and export.

### Core Components

#### `SdkLoggerProvider`

This is the implementation of the `LoggerProvider` and deals with concerns such
as processing and exporting Logs.

- Implements the `LoggerProvider` trait.
- Creates and manages `SdkLogger` instances.
- Holds logging configuration, including `Resource` and processors.
- Does not retain a list of created loggers. Instead, it passes an owned clone
of itself to each logger created. This is done so that loggers get a hold of
the configuration (like which processor to invoke).
- Uses an `Arc<LoggerProviderInner>` and delegates all configuration to
`LoggerProviderInner`. This allows cheap cloning of itself and ensures all
clones point to the same underlying configuration.
- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only take
`&self` in its methods like flush and shutdown. Else it needs to rely on
interior mutability that comes with runtime performance costs. Since methods
like shutdown usually need to mutate interior state, but this component can
only take `&self`, it defers to components like exporter to use interior
mutability to handle shutdown. (More on this in the exporter section)
- An alternative design was to let `SdkLogger` hold a `Weak` reference to the
`SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log
emission, significantly affecting throughput.
- `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no
references remain. However, in practice, loggers are often stored statically
inside appenders (like tracing-appender), so explicit shutdown by the user is
required.

#### `SdkLogger`

This is an implementation of the `Logger`, and contains functionality to create
and emit logs.

- Implements the `Logger` trait.
- Creates `SdkLogRecord` instances and emits them.
- Calls `OnEmit()` on all registered processors when emitting logs.
- Passes mutable references to each processor (`&mut log_record`), i.e.,
ownership is not passed to the processor. This ensures that the logger avoids
cloning costs. Since a mutable reference is passed, processors can modify the
log, and it will be visible to the next processor in the chain.
- Since the processor only gets a reference to the log, it cannot store it
beyond the `OnEmit()`. If a processor needs to buffer logs, it must explicitly
copy them to the heap.
- This design allows for stack-only log processing when exporting to operating
system native facilities like `Windows ETW` or `Linux user_events`.
- OTLP Exporting requires network calls (HTTP/gRPC) and batching of logs for
efficiency purposes. These exporters buffer log records by copying them to the
heap. (More on this in the BatchLogRecordProcessor section)

#### `LogRecord`

- Holds log data, including attributes.
- Uses an inline array for up to 5 attributes to optimize stack usage.
- Falls back to a heap-allocated `Vec` if more attributes are required.
- Inspired by Go’s `slog` library for efficiency.

#### LogRecord Processors

`SdkLoggerProvider` allows being configured with any number of LogProcessors.
They get called in the order of registration. Log records are passed to the
`OnEmit` method of LogProcessor. LogProcessors can be used to process the log
records, enrich them, filter them, and export to destinations by leveraging
LogRecord Exporters.

Following built-in Log processors are provided in the Log SDK:

##### SimpleLogProcessor

This processor is designed to be used for exporting purposes. Export is handled
by an Exporter (which is a separate component). SimpleLogProcessor is "simple"
in the sense that it does not attempt to do any processing - it just calls the
exporter and passes the log record to it. To comply with OTel specification, it
synchronizes calls to the `Export()` method, i.e., only one `Export()` call will
be done at any given time.

SimpleLogProcessor is only used for test/learning purposes and is often used
along with a `stdout` exporter.

##### BatchLogProcessor

This is another "exporting" processor. As with SimpleLogProcessor, a different
component named LogExporter handles the actual export logic. BatchLogProcessor
buffers/batches the logs it receives into an in-memory buffer. It invokes the
exporter every 1 second or when 512 items are in the batch (customizable). It
uses a background thread to do the export, and communication between the user
thread (where logs are emitted) and the background thread occurs with `mpsc`
channels.

The max amount of items the buffer holds is 2048 (customizable). Once the limit
is reached, any *new* logs are dropped. It *does not* apply back-pressure to the
user thread and instead drops logs.

As with SimpleLogProcessor, this component also ensures only one export is
active at a given time. A modified version of this is required to achieve higher
throughput in some environments.

In this design, at most 2048+512 logs can be in memory at any given point. In
other words, that many logs can be lost if the app crashes in the middle.

## LogExporters

LogExporters are responsible for exporting logs to a destination. Some of them
include:

1. **InMemoryExporter** - exports to an in-memory list, primarily for
unit-testing. This is used extensively in the repo itself, and external users
are also encouraged to use this.
2. **Stdout exporter** - prints telemetry to stdout. Only for debugging/learning
purposes. The output format is not defined and also is not performance
optimized. A production-recommended version with a standardized output format
is in the plan.
3. **OTLP Exporter** - OTel's official exporter which uses the OTLP protocol
that is designed with the OTel data model in mind. Both HTTP and gRPC-based
exporting is offered.
4. **Exporters to OS Kernel facilities** - These exporters are not maintained in
the core repo but listed for completion. They export telemetry to Windows ETW
or Linux user_events. They are designed for high-performance workloads. Due
to their nature of synchronous exporting, they do not require
buffering/batching. This allows logs to operate entirely on the stack and can
scale easily with the number of CPU cores. (Kernel uses per-CPU buffers for
the events, ensuring no contention)

## `tracing` Log Appender

Tracing appender is part of the
[opentelemetry-appender-tracing](https://crates.io/crates/opentelemetry-appender-tracing)
crate.

The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via
`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through
this integration.

- `tracing` is designed for high performance, using *layers* or *subscribers* to
handle emitted logs (events).
- The appender implements a `Layer`, receiving logs from `tracing`.
- Uses the OTel Logs API to create `LogRecord`, populate it, and emit it via
`Logger.emit(LogRecord)`.
- If no Logs SDK is present, the process is a no-op.

Note on terminology: Within OpenTelemetry, "tracing" refers to distributed
tracing (i.e creation of Spans) and not in-process structured logging and
execution traces. The crate "tracing" has notion of creating Spans as well as
Events. The events from "tracing" crate is what gets converted to OTel Logs,
when using this appender. Spans created using "tracing" crate is not handled by
this crate.

## Performance

// Call out things done specifically for performance

### Perf test - benchmarks

// Share ~~ numbers

### Perf test - stress test

// Share ~~ numbers

## Summary

- OpenTelemetry Logs does not provide a user-facing logging API.
- Instead, it integrates with existing logging libraries (`log`, `tracing`).
- The Logs API defines key traits but performs no operations unless an SDK is
installed.
- The Logs SDK enables log processing, transformation, and export.
- The Logs SDK is performance optimized to minimize copying and heap allocation,
wherever feasible.
- The `tracing` appender efficiently routes logs to OpenTelemetry without
modifying existing logging workflows.
6 changes: 6 additions & 0 deletions docs/design/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# OpenTelemetry Rust Metrics Design

Status:
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)

TODO:
6 changes: 6 additions & 0 deletions docs/design/traces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# OpenTelemetry Rust Traces Design

Status:
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)

TODO: