-
Notifications
You must be signed in to change notification settings - Fork 69
Internal logs architecture document #1741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1b42ce6
313db2f
86f764a
a734963
102d6dc
3716b6c
908b520
0c0134e
bef718f
5d7eee7
208ad45
1398789
cfba449
5a13e35
c4c9381
80f1c5b
50234c3
8ef345c
7afae73
a694088
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| # Internal Telemetry Logging Pipeline | ||
|
|
||
| This documents the choices available in the internal logging | ||
| configuration object in | ||
| `otap_df_config::pipeline::service::telemetry::logs`. See the | ||
| [internal telemetry crate's README](../crates/telemetry/README.md) for | ||
| the motivation behind this configuration as well as for a description | ||
| of the internal metrics pipeline. | ||
|
|
||
| ## Overview | ||
|
|
||
| The internal telemetry SDK is designed for the engine to safely | ||
| consume its own telemetry, and we intend for the self-hosted telemetry | ||
| pipeline to be the standard configuration for all OpenTelemetry | ||
| signals. | ||
|
|
||
| Consuming self-generated telemetry presents a potential | ||
| feedback loop, situations where a telemetry pipeline creates pressure | ||
| on itself. We have designed for the OTAP dataflow engine to remain | ||
| reliable even with this kind of dependency on itself. | ||
|
|
||
| ## Internal Telemetry Receiver (ITR) | ||
|
|
||
| The Internal Telemetry Receiver or "ITR" is an OTAP-Dataflow receiver | ||
| component that receives telemetry from internal sources and sends | ||
| to an internal telemetry pipeline. An internal | ||
| telemetry pipeline consists of one (global) or more (NUMA-regional) | ||
| ITR components and any of the connected processor and exporter | ||
| components reachable from ITR source nodes. | ||
|
Comment on lines
+28
to
+29
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, the ITR(s) are part of the Internal Telemetry System or ITS which is the combination of registries (entity and metric set) and an iternal otap-based telemetry pipeline. |
||
|
|
||
| ## Logs instrumentation | ||
|
|
||
| The OTAP Dataflow engine has dedicated macros, and every component is | ||
| configured with an internal telemetry SDK meant for primary | ||
| instrumentation. Using the `otel_info!(effect, name, args...)` macro | ||
| requires access the component EffectHandler. This is considered | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need a separate object (not the effecthandler) to hold the logger. The effect handler is instantiated per component, and it is even different data structure depending on the component type (receiver effect handler is different from the processor effect handler).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds possible, and we already have a request-level Context object use for Ack/Nack handling. This is a step above the kind of protection we need here, however, so I would evaluate that objective on its own. Why should we want per-request-context telemetry settings, instead of per-component settings with isolated internal pipelines using entirely different choices w/o runtime dispatch?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this request context available at receiver level (where the message is received/produced)? It should be something that is received since the beginning by the first receiver. |
||
| first-party internal logging, and other uses of Tokio `tracing` are | ||
| considered third-party internal logging. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to design and implement the internal logging and explicitly exclude/ignore third-party logging.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
|
||
| ## Pitfall avoidance | ||
|
|
||
| The OTAP-Dataflow engine is safeguarded against many self-induced | ||
| telemetry pitfalls, as follows: | ||
|
|
||
| - OTAP-Dataflow components reachable from an ITR cannot be configured | ||
| to send to an ITR node. This avoids a direct feedback cycle for | ||
| internal telemetry because the components cannot reach | ||
| themselves. For example, ITR and downstream components may be | ||
| configured for raw logging. | ||
| - Use of dedicated thread(s) for internal telemetry output. | ||
| - Thread-local state to avoid third-party instrumentation in | ||
| dedicated internal telemetry threads. | ||
| - Components under observation (non-ITR components) use | ||
| per-engine-core internal logs buffer, allowing overflow. | ||
| - Non-blocking interfaces. We prefer to drop and count dropped | ||
| internal log events than to block the pipeline. | ||
| - Option to configure internal telemetry multiple ways, including the | ||
| no-op implementation, global or regional logs consumers, buffered and | ||
| unbuffered. | ||
|
|
||
| ## OTLP-bytes first | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In our recent telemetry guidelines, for events we distinguish between entity-based attributes (stable context) and other event-specific attributes (dynamic context). Using this terminology, attributes belonging to the stable context do not need to be emitted with every event instance; instead, they are identified by a unique numeric ID that references the attributes in an attribute registry. The "dynamic" attributes are the ones that should travel as OTLP bytes from the hot path to the cold path. If I recall our recent conversation correctly, you were arguing that building this dynamic map would take roughly the same amount of time in a classic representation as in OTLP bytes form. That seems plausible to me, provided we are careful with this micro-serialization and keep attribute values simple. In any case, I think we should run a few benchmarks to validate all of this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to take this one step at a time. Presently, the internal logging design is concerned with the mechanics for handling and encoding events without considering entity-based attributes. We will be able to add dynamic context using the appropriate numeric ID to represent "scope" in a future PR. |
||
|
|
||
| As a key design decision, the OTAP-Dataflow internal telemetry data | ||
| path produces a partially encoded OTLP-bytes representation first. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be OtapPayload first, no? It will be generated initially from OTLP bytes, but the "transport" object should be OtapPayload in my opinion.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For internal pipelines, OtapPayload::OtlpBytes is the fully formed OTLP encoding, yes. The full encoding includes resource and scope, whereas the previously merged There is still an intermediate representation, called self_tracing::LogRecord which defers encoding the whole OTLP payload immediately. Mainly, for the bytes that must be copied because of lifetime, requiring ownership, we encode OTLP bytes immediately. In the case of raw logging, we skip the whole payload representation. |
||
| This is an intermediate format, | ||
| `otap_df_telemetry::self_tracing::LogRecord` which include the | ||
| timestamp, callsite metadata, and the OTLP bytes encoding of the body | ||
| and attrbutes. | ||
|
|
||
| Because OTLP bytes is one of the builtin `OtapPayload` formats, it is | ||
| simple to get from a slic of `LogRecord` to the `OtapPayload` we need | ||
| to consume internal telemetry. To obtain the partial bytes encoding | ||
| needed, we have a custom [Tokio `tracing` Event][TOKIOEVENT] handler | ||
| based on `otap_df_pdata::otlp::common::ProtoBuffer`. | ||
|
|
||
| [TOKIOEVENT]: https://docs.rs/tracing/latest/tracing/struct.Event.html | ||
|
|
||
| ## Raw logging | ||
|
|
||
| We support formatting events for direct printing to the console from | ||
| OTLP bytes. For the dynamic encoding, these are consumed using | ||
| `otap_df_pdata::views::logs::LogsDataView`, making the operation | ||
| zero-copy. We refer to this most-basic form of printing to the console | ||
| as raw logging because it is a safe configuration that avoids feedback | ||
| for internal logging. | ||
|
|
||
| Note: Raw logging is likely to introduce contention over the console. | ||
|
|
||
| In cases where internal logging code is forced to handle its own | ||
| errors, the `otap_df_telemetry::raw_error!` macro is meant for | ||
| emergency use, to report about failures to log. | ||
|
|
||
| ## Logging provider modes | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is this configured? Per component?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
|
||
| The logging configuration supports multiple distinct provider mode | ||
| settings: | ||
|
|
||
| - Global: The default Tokio subscriber, this will apply in threads | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I insist that we should not consider this case (at least not initially)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please explain what you are aiming for? Use of Tokio tracing is a widespread standard for diagnostics that is well understood, already useful, and recommended by OpenTelemetry. We have to make a choice about the Tokio subscriber that we install, and we are going to have a setting that controls the global behavior. Since you seem to want this disabled, you can set
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are basically intercepting it/not using it for the telemetry of the engine. We should not even configure it or depend on it at this level. If an external library uses it, it will not be captured as it is out of control from the engine. External libraries could also use other tracing libraries. |
||
| that do not belong to an OTAP dataflow engine core. | ||
| - Engine: This is the default configuration for engine core threads. | ||
| - Internal: This is the default configuration for internal telemetry | ||
| pipeline components. | ||
|
|
||
| Provider mode values are: | ||
|
|
||
| - Noop: Ignore these producers | ||
| - Unbuffered: Use a non-blocking write to the internal logs channel. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By definition the channel has some buffer. How to avoid this at producer level?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Unbuffered and Buffered modes are associated with a channel, yes. The Noop and Raw mode both avoid the use of a channel, and for this we have distinct providers we can configure. The present proposal include |
||
| Unbuffered is the default for the global provider. | ||
| - Buffered: Use a thread-local buffer, requires managed flushing. | ||
| The global provider is not supported for buffered logging. | ||
| - OpenTelemetry: Use the OpenTelemetry SDK. This option has the most | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we not mixing the concepts here? I see opentelemetry SDK more in the "consumer/destination" side of the internal telemetry
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a point that I haven't made clear, connected with yours. OpenTelemetry is a provider because it expects to be invoked in context, it's a provider of an API not exactly a destination for telemetry. For my proposed configuration here, the |
||
| comprehensive obsevability, including OpenTelemetry traces | ||
| integration. | ||
| - Raw: Use the raw logger, meaning to synchronously write to the | ||
| console. For asynchronous console logging instead, use Buffered or | ||
| Unbuffered provider mod and set the Raw output mode. | ||
|
|
||
| ## Output modes | ||
|
|
||
| When any of the providers use Buffered or Unbuffered modes, an | ||
| engine-managed thread is responsible for consuming internal logs. This | ||
| `LogsCollector` thread is currently global, but could be NUMA-regional | ||
| as described in [README](./README.md), and it can be configured currently | ||
| for raw or internal logging. | ||
|
|
||
| The output modes are: | ||
|
|
||
| - Noop: Not a real output mode, this setting will cause an error when | ||
| any provider uses Buffered or Unbuffered. | ||
| - Raw: Use raw console logging from the global or regional logging | ||
| thread. | ||
| - Internal: Use an Internal Telemetry Receiver as the destination. | ||
|
|
||
| ## Default configuration | ||
|
|
||
| In this configuration, a dedicated `LogsCollector` thread consumes | ||
| from the channel and prints to console. | ||
|
|
||
| ```yaml | ||
| service: | ||
| telemetry: | ||
| logs: | ||
| level: info | ||
| providers: | ||
| global: unbuffered | ||
| engine: buffered | ||
| internal: noop | ||
| output: raw | ||
| ``` | ||
|
|
||
| ```mermaid | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking on something a little simpler:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @andborja. For the record, I think these visions are compatible! The part you have as "Event manager (configurable)" in the diagram, that's the Internal Telemetry Receiver component. There is one instance in the configuration. As the engine starts, we will have one (global) or one per NUMA region of these, and they can have any configuration we all agree. I have no configuration so far, and (as my next PR will show) in my current draft the ITR uses send_message to the default out-port. The part where you have multiple internal pipelines. That's already part of the OTAP-Dataflow graph structure, the idea that any component including ITR can have multiple output destinations and policies allowing various behavior in the pipeline-level configuration. So, if the ITR configuration for "event manager" as you describe it can name the out-ports that it wants to send to, I think we'll have essentially the same diagram. @lquerel please see if this aligns with your thinking, the idea of a single ITR in the OTAP-dataflow pipeline configuration with some sort of policy that it uses to send to one or more out-ports in the config. The ITR configuration is a mapping from something to the out-port name, that is.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I think is different is the hard/specific separation and decoupling of the production of the internal telemetry and its routing and consumption. |
||
| flowchart LR | ||
| subgraph "Thread Contexts" | ||
| G[Global Threads<br/>HTTP admin, etc.] | ||
| E[Engine Threads<br/>Pipeline cores] | ||
| I[Internal Threads<br/>ITR components] | ||
| end | ||
|
|
||
| subgraph "Provider Layers" | ||
| UL[UnbufferedLayer<br/>immediate send] | ||
| BL[ThreadBufferedLayer<br/>thread-local buffer] | ||
| NL[Noop<br/>dropped] | ||
| end | ||
|
|
||
| subgraph "Channel" | ||
| CH[(flume channel)] | ||
| end | ||
|
|
||
| subgraph "Output" | ||
| LC[LogsCollector Thread] | ||
| CON[Console<br/>stdout/stderr] | ||
| end | ||
|
|
||
| G -->|tracing event| UL | ||
| E -->|tracing event| BL | ||
| I -->|tracing event| NL | ||
|
|
||
| UL -->|LogPayload::Singleton| CH | ||
| BL -->|periodic flush<br/>LogPayload::Batch| CH | ||
| NL -.->|discarded| X[null] | ||
|
|
||
| CH --> LC | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is one component missing here. There might be some "subscription manager" or some component in charge of pulling the objects from the channel and sending it to the configured destination (console or an internal telemetry receiver channel)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This "LC" is the logs collector. The design documented in crates/telemetry/README.md calls for this to be NUMA-regional eventually, for now it's a global/single thread. This is one of the dedicated threads described for internal telemetry above, and it pulls from the channel and does what the OutputMode prescribes, meaning it prints to the console or sends to the internal pipeline. |
||
| LC -->|raw format| CON | ||
| ``` | ||
|
|
||
| ## Internal Telemetry Receiver configuration | ||
|
|
||
| In this configuration, the `InternalTelemetryReceiver` node consumes | ||
| from the channel and emits `OtapPayload::ExportLogsRequest` into the | ||
| pipeline. | ||
|
|
||
| ```yaml | ||
| service: | ||
| telemetry: | ||
| logs: | ||
| level: info | ||
| providers: | ||
| global: unbuffered | ||
| engine: buffered | ||
| internal: noop | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This configuration should not be global to the engine, but local to a pipeline (node for our case?)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the accompanying document, I tried to capture this concept that the Internal Telemetry Receiver and all of its downstream nodes are considered an Internal Telemetry pipeline. All node in the internal telemetry pipeline would have the Noop configuration in this example. This setting is meant to apply to all of those internal telemetry components. I'm aware of a desire to control the telemetry provider in more fine-grain ways. For example, we might imagine overriding LogLevel at the component level. We can imagine varying the ProviderMode at that level too, it's just more-detailed configuration. |
||
| output: internal | ||
|
|
||
| nodes: | ||
| internal_telemetry: | ||
| kind: receiver | ||
| plugin_urn: "urn:otel:otlp:telemetry:receiver" | ||
| out_ports: | ||
| out_port: | ||
| destinations: | ||
| - otlp_exporter | ||
|
|
||
| otlp_exporter: | ||
| kind: exporer | ||
| ... | ||
| ``` | ||
|
|
||
| ```mermaid | ||
| flowchart LR | ||
| subgraph "Thread Contexts" | ||
| G[Global Threads<br/>HTTP admin, etc.] | ||
| E[Engine Threads<br/>Pipeline cores] | ||
| I[Internal Threads<br/>ITR components] | ||
| end | ||
|
|
||
| subgraph "Provider Layers" | ||
| UL[UnbufferedLayer<br/>immediate send] | ||
| BL[ThreadBufferedLayer<br/>thread-local buffer] | ||
| NL[Noop<br/>dropped] | ||
| end | ||
|
|
||
| subgraph "Channel" | ||
| CH[(flume channel)] | ||
| end | ||
|
|
||
| subgraph "Internal Telemetry Pipeline" | ||
| ITR[InternalTelemetryReceiver<br/>encodes to OTLP bytes] | ||
| PROC[Processors<br/>batch, filter, etc.] | ||
| EXP[Exporter<br/>OTLP, file, etc.] | ||
| end | ||
|
|
||
| G -->|tracing event| UL | ||
| E -->|tracing event| BL | ||
| I -->|tracing event| NL | ||
|
|
||
| UL -->|LogPayload::Singleton| CH | ||
| BL -->|periodic flush<br/>LogPayload::Batch| CH | ||
| NL -.->|discarded| X[null] | ||
|
|
||
| CH --> ITR | ||
| ITR -->|OtapPayload<br/>ExportLogsRequest| PROC | ||
| PROC --> EXP | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the near future, I hope that this internal telemetry logging pipeline will become more general and integrate metrics and other types of signals as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely!