-
Couldn't load subscription status.
- Fork 69
Open
Description
Problem
Currently, EncodedBatch.upload_batch() requires cloning the entire compressed data vector on each upload attempt to preserve data for retry scenarios. This creates
unnecessary memory allocations and copies, especially for large compressed batches.
// Current implementation requires expensive clone for retry capability
pub async fn upload_batch(&self, batch: &EncodedBatch) -> Result<(), String> {
self.uploader
.upload(batch.data.clone(), &batch.event_name, &batch.metadata) // ← Full vector copy
.await
}Impact
- Memory overhead: Each retry attempt copies entire compressed batch data
- Performance: O(n) memory allocation where n = compressed batch size
- Scalability: Higher memory pressure under retry scenarios or high throughput
Proposed Solution
Replace Vec<u8> with Arc<Vec<u8>> in EncodedBatch to enable cheap reference sharing:
#[derive(Debug, Clone)]
pub struct EncodedBatch {
pub event_name: String,
pub data: Arc<Vec<u8>>, // ← Shared ownership
pub metadata: BatchMetadata,
}Benefits
- Cheap clones: Arc clone is O(1) reference counting vs O(n) memory copy
- Retry-friendly: Multiple references to same data without duplication
- Memory efficient: Single allocation shared across retry attempts
- API compatibility: Maintains existing retry semantics
Implementation Tasks
- Update
EncodedBatchstruct to useArc<Vec<u8>> - Modify
OtlpEncoder::encode_log_batch()to wrap result inArc - Update
uploader.upload()to handleArc<Vec<u8>>input (with Arc::try_unwrap optimization) - Update tests and benchmarks
- Consider backward compatibility if this is a breaking change
Alternative Considered
- Consume ownership: Change API to
upload_batch(batch: EncodedBatch)- but this breaks retry capability - Keep current clone: Simple but inefficient for large batches
Priority
Medium - Performance optimization that becomes more important with:
- Large compressed batch sizes (>100KB)
- High retry rates
- High-throughput scenarios
Reactions are currently unavailable