-
Notifications
You must be signed in to change notification settings - Fork 69
Add benchmark overview doc #1528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cijothomas
wants to merge
25
commits into
open-telemetry:main
Choose a base branch
from
cijothomas:cijothomas/benchoverviewdoc1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
a459ee4
Add benchmark overview doc
cijothomas 38ca568
use throughput per core
cijothomas 557c524
nits
cijothomas 3357270
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas bad5cb6
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas 6f33600
few feedback addressed
cijothomas 2a68ded
consistent wordings
cijothomas 0c461ad
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas e95ab5c
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas 132fd9b
fill in available data
cijothomas 18259e0
few tweaks to clarfy
cijothomas 8c8bfcc
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas 8961a15
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas 4efa9a1
snaity
cijothomas 218676b
md
cijothomas b501207
add passthrough
cijothomas 2dbd3f1
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas c8cc9b8
fill some data for standard load
cijothomas 6cc3316
fill more values
cijothomas a1abd95
include passthrough
cijothomas fa63a4f
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas e28d208
lengthh
cijothomas c407c88
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas 3c67b1d
clarify syslog
cijothomas 029b0b3
Merge branch 'main' into cijothomas/benchoverviewdoc1
cijothomas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,250 @@ | ||
| # OpenTelemetry Arrow Performance Summary | ||
|
|
||
| ## Overview | ||
|
|
||
| The OpenTelemetry Arrow (OTel Arrow) project is currently in **Phase 2**, | ||
| building an end-to-end Arrow-based telemetry pipeline in Rust. Phase 1 focused | ||
| on collector-to-collector traffic compression using the OTAP protocol, achieving | ||
| significant network bandwidth savings. Phase 2 expands this foundation by | ||
| implementing the entire in-process pipeline using Apache Arrow's columnar | ||
| format, targeting substantial improvements in data processing efficiency while | ||
| maintaining the network efficiency gains from Phase 1. | ||
|
|
||
| The dataflow engine (df-engine), implemented in Rust, provides predictable | ||
| performance characteristics and efficient resource utilization across varying | ||
| load conditions. This columnar approach is expected to offer substantial | ||
| advantages over traditional row-oriented telemetry pipelines in terms of CPU | ||
| efficiency, memory usage, and throughput. | ||
|
|
||
| This document presents key performance metrics across different load scenarios | ||
| and test configurations. | ||
|
|
||
| ### Test Environment | ||
|
|
||
| All performance tests are executed on bare-metal compute instance with the | ||
| following specifications: | ||
|
|
||
| - **CPU**: 64 cores (x86-64 architecture) | ||
| - **Memory**: 512 GB RAM | ||
| - **Platform**: Oracle Bare Metal Instance | ||
| - **OS**: Oracle Linux 8 | ||
|
|
||
| This consistent, high-performance environment ensures reproducible results and | ||
| allows for comprehensive testing across various CPU core configurations (1, 4, | ||
| and 8 cores etc.) by constraining the df-engine to specific core allocations. | ||
cijothomas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Performance Metrics | ||
|
|
||
| #### Idle State Performance | ||
|
|
||
| Baseline resource consumption with no active telemetry traffic. | ||
cijothomas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| | Metric | Value | | ||
| |--------|-------| | ||
| | CPU Usage | TBD | | ||
| | Memory Usage | TBD | | ||
|
|
||
| These baseline metrics validate that the engine maintains minimal resource | ||
| footprint when idle, ensuring efficient operation in environments with variable | ||
| telemetry loads. | ||
|
|
||
| #### Standard Load Performance | ||
|
|
||
| Resource utilization at 100,000 log records per second (100K logs/sec). Tests | ||
| are conducted with three different batch sizes to demonstrate the impact of | ||
cijothomas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| batching on performance. | ||
|
|
||
| **Test Parameters:** | ||
|
|
||
| - Total input load: 100,000 log records/second | ||
| - Average log record size: 1 KB | ||
| - Batch sizes tested: 10, 100, 1000, and 10000 records per request | ||
|
|
||
| This wide range of batch sizes evaluates performance across diverse deployment | ||
| scenarios. Small batches (10-100) represent edge collectors or real-time | ||
| streaming requirements, while large batches (1000-10000) represent gateway | ||
cijothomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| collectors and high-throughput aggregation points. This approach ensures a fair | ||
| assessment, highlighting both the overhead for small batches and the significant | ||
| efficiency gains inherent to Arrow's columnar format at larger batch sizes. | ||
|
|
||
| ##### Standard Load - OTAP -> OTAP (Native Protocol) | ||
|
|
||
| | CPU Cores | Batch Size | CPU Usage | Memory Usage | | ||
| |-----------|------------|-----------|---------------| | ||
| | 1 Core | 10/batch | TBD | TBD | | ||
| | 1 Core | 100/batch | TBD | TBD | | ||
| | 1 Core | 1000/batch | TBD | TBD | | ||
| | 1 Core | 10000/batch | TBD | TBD | | ||
| | 4 Cores | 10/batch | TBD | TBD | | ||
| | 4 Cores | 100/batch | TBD | TBD | | ||
| | 4 Cores | 1000/batch | TBD | TBD | | ||
| | 4 Cores | 10000/batch | TBD | TBD | | ||
| | 8 Cores | 10/batch | TBD | TBD | | ||
| | 8 Cores | 100/batch | TBD | TBD | | ||
| | 8 Cores | 1000/batch | TBD | TBD | | ||
| | 8 Cores | 10000/batch | TBD | TBD | | ||
|
|
||
| This represents the optimal scenario where the df-engine operates with its | ||
| native protocol end-to-end, eliminating protocol conversion overhead. The | ||
| thread-per-core architecture demonstrates linear scaling across CPU cores | ||
| without contention, allowing the engine to be configured for specific deployment | ||
| requirements. | ||
|
|
||
| ##### Standard Load - OTLP -> OTAP (Protocol Conversion) | ||
|
|
||
| | CPU Cores | Batch Size | CPU Usage | Memory Usage | | ||
| |-----------|------------|-----------|---------------| | ||
| | 1 Core | 10/batch | TBD | TBD | | ||
| | 1 Core | 100/batch | TBD | TBD | | ||
| | 1 Core | 1000/batch | TBD | TBD | | ||
| | 1 Core | 10000/batch | TBD | TBD | | ||
| | 4 Cores | 10/batch | TBD | TBD | | ||
| | 4 Cores | 100/batch | TBD | TBD | | ||
| | 4 Cores | 1000/batch | TBD | TBD | | ||
| | 4 Cores | 10000/batch | TBD | TBD | | ||
| | 8 Cores | 10/batch | TBD | TBD | | ||
| | 8 Cores | 100/batch | TBD | TBD | | ||
| | 8 Cores | 1000/batch | TBD | TBD | | ||
| | 8 Cores | 10000/batch | TBD | TBD | | ||
|
|
||
| This scenario represents the common case where OpenTelemetry SDK clients emit | ||
| OTLP (not yet capable of OTAP), and the df-engine converts to OTAP for egress. | ||
| This demonstrates backward compatibility and protocol conversion efficiency | ||
| while maintaining linear scaling characteristics across CPU cores. | ||
|
|
||
| #### Saturation Performance | ||
|
|
||
| Behavior at maximum capacity when physical resource limits are reached. | ||
|
|
||
| ##### Saturation Load - OTAP -> OTAP (Native Protocol) | ||
|
|
||
| | CPU Cores | Maximum Sustained Throughput | Throughput / Core | Memory Usage | | ||
| |-----------|------------------------------|-------------------|--------------| | ||
| | 1 Core | TBD | TBD | TBD | | ||
| | 4 Cores | TBD | TBD | TBD | | ||
| | 8 Cores | TBD | TBD | TBD | | ||
|
|
||
| ##### Saturation Load - OTLP -> OTAP (Protocol Conversion) | ||
|
|
||
| | CPU Cores | Maximum Sustained Throughput | Throughput / Core | Memory Usage | | ||
| |-----------|------------------------------|-------------------|--------------| | ||
| | 1 Core | TBD | TBD | TBD | | ||
| | 4 Cores | TBD | TBD | TBD | | ||
| | 8 Cores | TBD | TBD | TBD | | ||
|
|
||
| Saturation testing validates the engine's stability under extreme load. The | ||
| df-engine exhibits well-defined behavior when operating at capacity, maintaining | ||
| predictable performance without degradation or instability. These results | ||
| demonstrate the maximum throughput achievable with different CPU core | ||
| allocations. The **Throughput / Core** metric provides a key efficiency | ||
| indicator for capacity planning. | ||
|
|
||
| <!--TODO: Document what is the behavior - is it applying backpressure | ||
| (`wait_for_result` feature)? or dropping items and keeping internal metric | ||
| about it.--> | ||
|
|
||
| ### Architecture | ||
|
|
||
| The OTel Arrow dataflow engine is built in Rust, to achieve high throughput and | ||
| low latency. The columnar data representation and zero-copy processing | ||
| capabilities enable efficient handling of telemetry data at scale. | ||
|
|
||
| #### Thread-Per-Core Design | ||
|
|
||
| The df-engine supports a configurable runtime execution model, using a | ||
| **thread-per-core architecture** that eliminates traditional concurrency | ||
| overhead. This design allows: | ||
|
|
||
| - **CPU Affinity Control**: Pipelines can be pinned to specific CPU cores or | ||
| groups through configuration | ||
| - **NUMA Optimization**: Memory and CPU assignments can be coordinated for | ||
| Non-Uniform Memory Access (NUMA) architectures | ||
| - **Workload Isolation**: Different telemetry signals or tenants can be assigned | ||
| to dedicated CPU resources, preventing resource contention | ||
| - **Reduced Synchronization**: Thread-per-core design minimizes lock contention | ||
| and context switching overhead | ||
|
|
||
| For detailed technical documentation, see the [OTAP Dataflow Engine | ||
| Documentation](../rust/otap-dataflow/README.md) and [Phase 2 | ||
| Design](phase2-design.md). | ||
|
|
||
| --- | ||
|
|
||
| ## Comparative Analysis: OTel Arrow vs OpenTelemetry Collector | ||
|
|
||
| ### Methodology | ||
cijothomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| To provide a fair and meaningful comparison between the OTel Arrow dataflow | ||
| engine and the OpenTelemetry Collector, we use **Syslog (UDP/TCP)** as the | ||
| ingress protocol for both systems. | ||
|
|
||
| #### Rationale for Syslog-Based Comparison | ||
|
|
||
| Syslog was specifically chosen as the input protocol because: | ||
|
|
||
| 1. Neutral Ground: Syslog is neither OTLP (OpenTelemetry Protocol) nor OTAP | ||
| (OpenTelemetry Arrow Protocol), ensuring neither system has a native protocol | ||
| advantage | ||
| 2. Real-World Relevance: Syslog is widely deployed in production environments, | ||
| particularly for log aggregation from network devices, legacy systems, and | ||
| infrastructure components | ||
| 3. Conversion Overhead: Both systems must perform meaningful work to convert | ||
| incoming Syslog messages into their internal representations: | ||
| - **OTel Collector**: Converts to Go-based `pdata` (protocol data) structures | ||
| - **OTel Arrow**: Converts to Arrow-based columnar memory format | ||
| 4. Complete Pipeline Test: This approach validates the full pipeline efficiency, | ||
| including parsing, transformation, and serialization stages | ||
|
|
||
| The output protocols are set to each system's native format: OTLP for the | ||
| OpenTelemetry Collector and OTAP for the OTel Arrow engine, ensuring optimal | ||
| egress performance for each. | ||
|
|
||
| ### Performance Comparison | ||
|
|
||
| #### Baseline (Idle State) | ||
|
|
||
| | Metric | OTel Collector | OTel Arrow | Improvement | | ||
| |--------|---------------|------------|-------------| | ||
| | CPU Usage | TBD | TBD | TBD | | ||
| | Memory Usage | TBD | TBD | TBD | | ||
|
|
||
| #### Standard Load (100K Syslog Messages/sec) | ||
|
|
||
| | Metric | OTel Collector | OTel Arrow | Improvement | | ||
| |--------|---------------|------------|-------------| | ||
| | CPU Usage | TBD | TBD | TBD | | ||
| | Memory Usage | TBD | TBD | TBD | | ||
| | Network Egress | TBD | TBD | TBD | | ||
| | Latency (p50) | TBD | TBD | TBD | | ||
| | Latency (p99) | TBD | TBD | TBD | | ||
| | Throughput (messages/sec) | TBD | TBD | TBD | | ||
|
|
||
| #### Saturation | ||
|
|
||
| | Metric | OTel Collector | OTel Arrow | Improvement | | ||
| |--------|---------------|------------|-------------| | ||
| | Maximum Sustained Throughput | TBD | TBD | TBD | | ||
| | Throughput / Core | TBD | TBD | TBD | | ||
| | CPU at Saturation | TBD | TBD | TBD | | ||
| | Memory at Saturation | TBD | TBD | TBD | | ||
| | Behavior Under Overload | TBD | TBD | TBD | | ||
|
|
||
| ### Key Findings | ||
|
|
||
| To be populated with analysis once benchmark data is available. | ||
|
|
||
| The comparative analysis will demonstrate: | ||
|
|
||
| - Relative efficiency of Arrow-based columnar processing vs traditional | ||
| row-oriented data structures | ||
| - Memory allocation patterns and garbage collection impact (Rust vs Go) | ||
| - Throughput and latency characteristics under varying load conditions | ||
|
|
||
| --- | ||
|
|
||
| ## Additional Resources | ||
|
|
||
| - [Detailed Benchmark Results from phase2](benchmarks.md) | ||
| - [Phase 1 Benchmark Results](benchmarks-phase1.md) | ||
| - [OTAP Dataflow Engine Documentation](../rust/otap-dataflow/README.md) | ||
| - [Project Phases Overview](project-phases.md) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.