Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions rust/otap-dataflow/docs/self_tracing_architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# Internal Telemetry Logging Pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the near future, I hope that this internal telemetry logging pipeline will become more general and integrate metrics and other types of signals as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely!


This documents the choices available in the internal logging
configuration object in
`otap_df_config::pipeline::service::telemetry::logs`. See the
[internal telemetry crate's README](../crates/telemetry/README.md) for
the motivation behind this configuration as well as for a description
of the internal metrics pipeline.

## Overview

The internal telemetry SDK is designed for the engine to safely
consume its own telemetry, and we intend for the self-hosted telemetry
pipeline to be the standard configuration for all OpenTelemetry
signals.

Consuming self-generated telemetry presents a potential
feedback loop, situations where a telemetry pipeline creates pressure
on itself. We have designed for the OTAP dataflow engine to remain
reliable even with this kind of dependency on itself.

## Internal Telemetry Receiver (ITR)

The Internal Telemetry Receiver or "ITR" is an OTAP-Dataflow receiver
component that receives telemetry from internal sources and sends
to an internal telemetry pipeline. An internal
telemetry pipeline consists of one (global) or more (NUMA-regional)
ITR components and any of the connected processor and exporter
components reachable from ITR source nodes.
Comment on lines +28 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the ITR(s) are part of the Internal Telemetry System or ITS which is the combination of registries (entity and metric set) and an iternal otap-based telemetry pipeline.


## Logs instrumentation

The OTAP Dataflow engine has dedicated macros, and every component is
configured with an internal telemetry SDK meant for primary
instrumentation. Using the `otel_info!(effect, name, args...)` macro
requires access the component EffectHandler. This is considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a separate object (not the effecthandler) to hold the logger. The effect handler is instantiated per component, and it is even different data structure depending on the component type (receiver effect handler is different from the processor effect handler).
I think we need something like a "request context", that is the same instance through the whole pipeline path, and different per request.
That is the one that should keep the instance of the logger (and later the TelemetrySettings).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds possible, and we already have a request-level Context object use for Ack/Nack handling. This is a step above the kind of protection we need here, however, so I would evaluate that objective on its own. Why should we want per-request-context telemetry settings, instead of per-component settings with isolated internal pipelines using entirely different choices w/o runtime dispatch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this request context available at receiver level (where the message is received/produced)? It should be something that is received since the beginning by the first receiver.
Technically it should be configured per "pipeline", this is a sequence of components for the internal telemetry. This concept does not exist today. We have a graph of components that you don't really know if they are part of the same "pipeline".
As we have nodes, that are linked in sequence, we need to set the logger at the ITR, and then the rest of the nodes in the same request should use the same configuration.
For example, if the node graph is:
ITR->Processor1
OTLPReceiver->Processor1
It would be like 2 pipelines pointing to the same processor, one internal.
The request coming from the ITR to the Processor1 should use Noop (by default), and the other should use the default logger (sending telemetry through the channel).
In general, the logger to use depends on the request, if it is part of an internal telemetry pipeline or not.

first-party internal logging, and other uses of Tokio `tracing` are
considered third-party internal logging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to design and implement the internal logging and explicitly exclude/ignore third-party logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The providers::internal setting does just that, and it defaults to Noop to prevent internal telemetry pipeline components for logging.


## Pitfall avoidance

The OTAP-Dataflow engine is safeguarded against many self-induced
telemetry pitfalls, as follows:

- OTAP-Dataflow components reachable from an ITR cannot be configured
to send to an ITR node. This avoids a direct feedback cycle for
internal telemetry because the components cannot reach
themselves. For example, ITR and downstream components may be
configured for raw logging.
- Use of dedicated thread(s) for internal telemetry output.
- Thread-local state to avoid third-party instrumentation in
dedicated internal telemetry threads.
- Components under observation (non-ITR components) use
per-engine-core internal logs buffer, allowing overflow.
- Non-blocking interfaces. We prefer to drop and count dropped
internal log events than to block the pipeline.
- Option to configure internal telemetry multiple ways, including the
no-op implementation, global or regional logs consumers, buffered and
unbuffered.

## OTLP-bytes first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our recent telemetry guidelines, for events we distinguish between entity-based attributes (stable context) and other event-specific attributes (dynamic context). Using this terminology, attributes belonging to the stable context do not need to be emitted with every event instance; instead, they are identified by a unique numeric ID that references the attributes in an attribute registry.

The "dynamic" attributes are the ones that should travel as OTLP bytes from the hot path to the cold path. If I recall our recent conversation correctly, you were arguing that building this dynamic map would take roughly the same amount of time in a classic representation as in OTLP bytes form. That seems plausible to me, provided we are careful with this micro-serialization and keep attribute values simple.

In any case, I think we should run a few benchmarks to validate all of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to take this one step at a time. Presently, the internal logging design is concerned with the mechanics for handling and encoding events without considering entity-based attributes. We will be able to add dynamic context using the appropriate numeric ID to represent "scope" in a future PR.


As a key design decision, the OTAP-Dataflow internal telemetry data
path produces a partially encoded OTLP-bytes representation first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be OtapPayload first, no? It will be generated initially from OTLP bytes, but the "transport" object should be OtapPayload in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internal pipelines, OtapPayload::OtlpBytes is the fully formed OTLP encoding, yes. The full encoding includes resource and scope, whereas the previously merged self_tracing package (in #1735) supports only directly encoding single LogRecords at this time.

There is still an intermediate representation, called self_tracing::LogRecord which defers encoding the whole OTLP payload immediately. Mainly, for the bytes that must be copied because of lifetime, requiring ownership, we encode OTLP bytes immediately. In the case of raw logging, we skip the whole payload representation.

This is an intermediate format,
`otap_df_telemetry::self_tracing::LogRecord` which include the
timestamp, callsite metadata, and the OTLP bytes encoding of the body
and attrbutes.

Because OTLP bytes is one of the builtin `OtapPayload` formats, it is
simple to get from a slic of `LogRecord` to the `OtapPayload` we need
to consume internal telemetry. To obtain the partial bytes encoding
needed, we have a custom [Tokio `tracing` Event][TOKIOEVENT] handler
based on `otap_df_pdata::otlp::common::ProtoBuffer`.

[TOKIOEVENT]: https://docs.rs/tracing/latest/tracing/struct.Event.html

## Raw logging

We support formatting events for direct printing to the console from
OTLP bytes. For the dynamic encoding, these are consumed using
`otap_df_pdata::views::logs::LogsDataView`, making the operation
zero-copy. We refer to this most-basic form of printing to the console
as raw logging because it is a safe configuration that avoids feedback
for internal logging.

Note: Raw logging is likely to introduce contention over the console.

In cases where internal logging code is forced to handle its own
errors, the `otap_df_telemetry::raw_error!` macro is meant for
emergency use, to report about failures to log.

## Logging provider modes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this configured? Per component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logs.rs draft included in this PR includes a providers section to allow logging providers to be configured base on the context.


The logging configuration supports multiple distinct provider mode
settings:

- Global: The default Tokio subscriber, this will apply in threads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I insist that we should not consider this case (at least not initially)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what you are aiming for? Use of Tokio tracing is a widespread standard for diagnostics that is well understood, already useful, and recommended by OpenTelemetry. We have to make a choice about the Tokio subscriber that we install, and we are going to have a setting that controls the global behavior. Since you seem to want this disabled, you can set providers::global: noop for that effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are basically intercepting it/not using it for the telemetry of the engine. We should not even configure it or depend on it at this level. If an external library uses it, it will not be captured as it is out of control from the engine. External libraries could also use other tracing libraries.

that do not belong to an OTAP dataflow engine core.
- Engine: This is the default configuration for engine core threads.
- Internal: This is the default configuration for internal telemetry
pipeline components.

Provider mode values are:

- Noop: Ignore these producers
- Unbuffered: Use a non-blocking write to the internal logs channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition the channel has some buffer. How to avoid this at producer level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unbuffered and Buffered modes are associated with a channel, yes. The Noop and Raw mode both avoid the use of a channel, and for this we have distinct providers we can configure. The present proposal include providers::internal: noop as the default to avoid telemetry from internal components entirely. You could also choose raw logging to avoid the channel, for example, or configure OpenTelemetry as the provider.

Unbuffered is the default for the global provider.
- Buffered: Use a thread-local buffer, requires managed flushing.
The global provider is not supported for buffered logging.
- OpenTelemetry: Use the OpenTelemetry SDK. This option has the most
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not mixing the concepts here? I see opentelemetry SDK more in the "consumer/destination" side of the internal telemetry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a point that I haven't made clear, connected with yours. OpenTelemetry is a provider because it expects to be invoked in context, it's a provider of an API not exactly a destination for telemetry.

For my proposed configuration here, the OutputMode type captures the behavior of the collector, and what you're proposing is that we could make OpenTelemetry be an output, the destination for telemetry after going through the channel? I think the answer is that that is not how OpenTelemetry APIs work, they're designed to be providers. It's not unreasonable to offer OpenTelemetry as a destination after the channel, but it would require translating from OTLP bytes into the OTel structured object since we would have bypassed the OpenTelemetry/Tokio bridge.

comprehensive obsevability, including OpenTelemetry traces
integration.
- Raw: Use the raw logger, meaning to synchronously write to the
console. For asynchronous console logging instead, use Buffered or
Unbuffered provider mod and set the Raw output mode.

## Output modes

When any of the providers use Buffered or Unbuffered modes, an
engine-managed thread is responsible for consuming internal logs. This
`LogsCollector` thread is currently global, but could be NUMA-regional
as described in [README](./README.md), and it can be configured currently
for raw or internal logging.

The output modes are:

- Noop: Not a real output mode, this setting will cause an error when
any provider uses Buffered or Unbuffered.
- Raw: Use raw console logging from the global or regional logging
thread.
- Internal: Use an Internal Telemetry Receiver as the destination.

## Default configuration

In this configuration, a dedicated `LogsCollector` thread consumes
from the channel and prints to console.

```yaml
service:
telemetry:
logs:
level: info
providers:
global: unbuffered
engine: buffered
internal: noop
output: raw
```

```mermaid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking on something a little simpler:
#1769

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @andborja.

For the record, I think these visions are compatible!

The part you have as "Event manager (configurable)" in the diagram, that's the Internal Telemetry Receiver component. There is one instance in the configuration. As the engine starts, we will have one (global) or one per NUMA region of these, and they can have any configuration we all agree. I have no configuration so far, and (as my next PR will show) in my current draft the ITR uses send_message to the default out-port.

The part where you have multiple internal pipelines. That's already part of the OTAP-Dataflow graph structure, the idea that any component including ITR can have multiple output destinations and policies allowing various behavior in the pipeline-level configuration. So, if the ITR configuration for "event manager" as you describe it can name the out-ports that it wants to send to, I think we'll have essentially the same diagram.

@lquerel please see if this aligns with your thinking, the idea of a single ITR in the OTAP-dataflow pipeline configuration with some sort of policy that it uses to send to one or more out-ports in the config. The ITR configuration is a mapping from something to the out-port name, that is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think is different is the hard/specific separation and decoupling of the production of the internal telemetry and its routing and consumption.
The ITR is not the same event manager. The event manager is part of the reception and routing of the messages produced by any component in the engine. The ITR is yet another receiver, so it is part of the "consumption" block.
A view with this separation simplify several downstream definitions including the configuration of the different components, and even the need or not of a ITR id in the "production" side.
The channels used here are not the same as the graph queues, so the out-ports and other concepts don't have the same meaning. The ITR is not a processor, it is a receiver, so it does not read the incoming messages from a "port". They have their own flume channel.

flowchart LR
subgraph "Thread Contexts"
G[Global Threads<br/>HTTP admin, etc.]
E[Engine Threads<br/>Pipeline cores]
I[Internal Threads<br/>ITR components]
end

subgraph "Provider Layers"
UL[UnbufferedLayer<br/>immediate send]
BL[ThreadBufferedLayer<br/>thread-local buffer]
NL[Noop<br/>dropped]
end

subgraph "Channel"
CH[(flume channel)]
end

subgraph "Output"
LC[LogsCollector Thread]
CON[Console<br/>stdout/stderr]
end

G -->|tracing event| UL
E -->|tracing event| BL
I -->|tracing event| NL

UL -->|LogPayload::Singleton| CH
BL -->|periodic flush<br/>LogPayload::Batch| CH
NL -.->|discarded| X[null]

CH --> LC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one component missing here. There might be some "subscription manager" or some component in charge of pulling the objects from the channel and sending it to the configured destination (console or an internal telemetry receiver channel)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "LC" is the logs collector. The design documented in crates/telemetry/README.md calls for this to be NUMA-regional eventually, for now it's a global/single thread. This is one of the dedicated threads described for internal telemetry above, and it pulls from the channel and does what the OutputMode prescribes, meaning it prints to the console or sends to the internal pipeline.

LC -->|raw format| CON
```

## Internal Telemetry Receiver configuration

In this configuration, the `InternalTelemetryReceiver` node consumes
from the channel and emits `OtapPayload::ExportLogsRequest` into the
pipeline.

```yaml
service:
telemetry:
logs:
level: info
providers:
global: unbuffered
engine: buffered
internal: noop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration should not be global to the engine, but local to a pipeline (node for our case?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the accompanying document, I tried to capture this concept that the Internal Telemetry Receiver and all of its downstream nodes are considered an Internal Telemetry pipeline. All node in the internal telemetry pipeline would have the Noop configuration in this example. This setting is meant to apply to all of those internal telemetry components.

I'm aware of a desire to control the telemetry provider in more fine-grain ways. For example, we might imagine overriding LogLevel at the component level. We can imagine varying the ProviderMode at that level too, it's just more-detailed configuration.

output: internal

nodes:
internal_telemetry:
kind: receiver
plugin_urn: "urn:otel:otlp:telemetry:receiver"
out_ports:
out_port:
destinations:
- otlp_exporter

otlp_exporter:
kind: exporer
...
```

```mermaid
flowchart LR
subgraph "Thread Contexts"
G[Global Threads<br/>HTTP admin, etc.]
E[Engine Threads<br/>Pipeline cores]
I[Internal Threads<br/>ITR components]
end

subgraph "Provider Layers"
UL[UnbufferedLayer<br/>immediate send]
BL[ThreadBufferedLayer<br/>thread-local buffer]
NL[Noop<br/>dropped]
end

subgraph "Channel"
CH[(flume channel)]
end

subgraph "Internal Telemetry Pipeline"
ITR[InternalTelemetryReceiver<br/>encodes to OTLP bytes]
PROC[Processors<br/>batch, filter, etc.]
EXP[Exporter<br/>OTLP, file, etc.]
end

G -->|tracing event| UL
E -->|tracing event| BL
I -->|tracing event| NL

UL -->|LogPayload::Singleton| CH
BL -->|periodic flush<br/>LogPayload::Batch| CH
NL -.->|discarded| X[null]

CH --> ITR
ITR -->|OtapPayload<br/>ExportLogsRequest| PROC
PROC --> EXP
```
Loading