Skip to content

Conversation

@paulojmdias
Copy link
Member

Description

WIP

Link to tracking issue

Fixes #44077

Testing

WIP

Benchmark results

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/fileexporter
cpu: Apple M3 Pro
                               │  main.txt   │               fg.txt                │
                               │   sec/op    │   sec/op     vs base                │
ZstdExportTraces/proto-12        5.607µ ± 4%   1.621µ ± 3%  -71.09% (p=0.000 n=10)
ZstdExportTraces/json-12         7.793µ ± 2%   4.746µ ± 8%  -39.10% (p=0.000 n=10)
ZstdExportTracesLarge/proto-12   7.668µ ± 2%   4.009µ ± 2%  -47.72% (p=0.000 n=10)
ZstdExportTracesLarge/json-12    25.62µ ± 3%   23.70µ ± 3%   -7.50% (p=0.000 n=10)
ZstdExportLogs/proto-12          8.903µ ± 1%   3.697µ ± 6%  -58.48% (p=0.000 n=10)
ZstdExportLogs/json-12           21.62µ ± 3%   19.43µ ± 2%  -10.11% (p=0.000 n=10)
ZstdExportLevels/level_0-12      7.833µ ± 9%   3.998µ ± 6%  -48.95% (p=0.000 n=10)
ZstdExportLevels/level_3-12      7.630µ ± 3%   4.043µ ± 3%  -47.02% (p=0.000 n=10)
ZstdExportLevels/level_7-12      7.630µ ± 3%   4.046µ ± 3%  -46.97% (p=0.000 n=10)
ZstdExportLevels/level_12-12     7.727µ ± 2%   4.107µ ± 2%  -46.85% (p=0.000 n=10)
ZstdExportLevels/level_22-12     7.616µ ± 1%   8.629µ ± 4%  +13.30% (p=0.000 n=10)
geomean                          9.288µ        5.433µ       -41.50%

                               │    main.txt     │                 fg.txt                  │
                               │ output-bytes/op │ output-bytes/op  vs base                │
ZstdExportTraces/proto-12            231.00 ± 0%        12.00 ± 0%  -94.81% (p=0.000 n=10)
ZstdExportTraces/json-12             335.00 ± 0%        16.00 ± 0%  -95.22% (p=0.000 n=10)
ZstdExportTracesLarge/proto-12       234.00 ± 0%        12.00 ± 0%  -94.87% (p=0.000 n=10)
ZstdExportTracesLarge/json-12        339.00 ± 0%        17.01 ± 0%  -94.98% (p=0.000 n=10)
ZstdExportLogs/proto-12              220.00 ± 0%        12.00 ± 0%  -94.55% (p=0.000 n=10)
ZstdExportLogs/json-12               333.00 ± 0%        17.01 ± 0%  -94.89% (p=0.000 n=10)
ZstdExportLevels/level_0-12          234.00 ± 0%        12.00 ± 0%  -94.87% (p=0.000 n=10)
ZstdExportLevels/level_3-12          234.00 ± 0%        12.00 ± 0%  -94.87% (p=0.000 n=10)
ZstdExportLevels/level_7-12          234.00 ± 0%        12.00 ± 0%  -94.87% (p=0.000 n=10)
ZstdExportLevels/level_12-12         234.00 ± 0%        12.00 ± 0%  -94.87% (p=0.000 n=10)
ZstdExportLevels/level_22-12         234.00 ± 0%        12.00 ± 0%  -94.87% (p=0.000 n=10)
geomean                               256.4             13.12       -94.88%

                               │   main.txt    │                fg.txt                │
                               │     B/op      │     B/op      vs base                │
ZstdExportTraces/proto-12           880.0 ± 0%     663.0 ± 0%  -24.66% (p=0.000 n=10)
ZstdExportTraces/json-12          2.578Ki ± 0%   1.308Ki ± 0%  -49.26% (p=0.000 n=10)
ZstdExportTracesLarge/proto-12    6.234Ki ± 0%   6.056Ki ± 0%   -2.87% (p=0.000 n=10)
ZstdExportTracesLarge/json-12     20.36Ki ± 0%   11.09Ki ± 0%  -45.52% (p=0.000 n=10)
ZstdExportLogs/proto-12           5.469Ki ± 0%   5.301Ki ± 0%   -3.07% (p=0.000 n=10)
ZstdExportLogs/json-12           16.923Ki ± 0%   8.849Ki ± 0%  -47.71% (p=0.000 n=10)
ZstdExportLevels/level_0-12       6.234Ki ± 0%   6.056Ki ± 0%   -2.86% (p=0.000 n=10)
ZstdExportLevels/level_3-12       6.234Ki ± 0%   6.027Ki ± 0%   -3.32% (p=0.000 n=10)
ZstdExportLevels/level_7-12       6.234Ki ± 0%   6.056Ki ± 0%   -2.87% (p=0.000 n=10)
ZstdExportLevels/level_12-12      6.234Ki ± 0%   6.057Ki ± 0%   -2.84% (p=0.000 n=10)
ZstdExportLevels/level_22-12      6.234Ki ± 0%   6.121Ki ± 0%   -1.82% (p=0.000 n=10)
geomean                           5.790Ki        4.648Ki       -19.73%

                               │  main.txt  │               fg.txt               │
                               │ allocs/op  │ allocs/op   vs base                │
ZstdExportTraces/proto-12        3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportTraces/json-12         11.00 ± 0%   10.00 ± 0%   -9.09% (p=0.000 n=10)
ZstdExportTracesLarge/proto-12   3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportTracesLarge/json-12    65.00 ± 0%   64.00 ± 0%   -1.54% (p=0.000 n=10)
ZstdExportLogs/proto-12          3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportLogs/json-12           25.00 ± 0%   24.00 ± 0%   -4.00% (p=0.000 n=10)
ZstdExportLevels/level_0-12      3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportLevels/level_3-12      3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportLevels/level_7-12      3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportLevels/level_12-12     3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
ZstdExportLevels/level_22-12     3.000 ± 0%   2.000 ± 0%  -33.33% (p=0.000 n=10)
geomean                          5.415        3.977       -26.56%

Documentation

WIP

@paulojmdias paulojmdias changed the title [exporter/file] add feature gate to use native file-level compression instead of message-level compression [exporter/file] add feature gate to use native file-level instead of message-level compression Feb 11, 2026
@paulojmdias paulojmdias requested a review from Copilot February 11, 2026 22:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an alpha feature gate to the file exporter to switch from legacy per-message zstd compression to native file-level zstd framing, producing standard .zst files compatible with zstd tooling (addressing #44077).

Changes:

  • Introduces feature gate exporter.file.nativeCompression and generated metadata/docs entries.
  • Adds native streaming zstd writer (with optional compression_level) and wires it into file writer creation + marshaling/export framing.
  • Adds tests/benchmarks to validate native .zst interoperability and measure performance.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
exporter/fileexporter/config.go Adds compression_level config + validation.
exporter/fileexporter/config.schema.yaml (Not updated in PR) schema currently lacks compression_level.
exporter/fileexporter/metadata.yaml Declares new feature gate metadata.
exporter/fileexporter/internal/metadata/generated_feature_gates.go Registers the feature gate (generated).
exporter/fileexporter/documentation.md Generated feature gate documentation table.
exporter/fileexporter/compression_writer.go Implements streaming zstd “native” compression writer with frame flushing.
exporter/fileexporter/compression_writer_test.go Unit tests for the streaming compression writer + level mapping.
exporter/fileexporter/factory.go Extends newFileWriter to optionally wrap output in streaming compression.
exporter/fileexporter/file_writer.go Adjusts export framing under the feature gate (NDJSON for JSON).
exporter/fileexporter/marshaller.go Disables message-level compression when native compression gate is enabled.
exporter/fileexporter/file_exporter.go Passes compression + level into newFileWriter.
exporter/fileexporter/grouping_file_exporter.go Passes compression + level into grouped writers.
exporter/fileexporter/file_exporter_test.go Updates test calls to new newFileWriter signature.
exporter/fileexporter/factory_test.go Updates test calls to new newFileWriter signature.
exporter/fileexporter/native_compression_test.go New integration tests for native .zst output (Go decoder + zstd CLI).
exporter/fileexporter/native_compression_bench_test.go New benchmarks comparing legacy vs native compression.
exporter/fileexporter/generated_package_test.go Generated TestMain import order changed (currently not gofmt-compliant).
exporter/fileexporter/testdata/config.yaml Minor YAML quoting change.
.chloggen/feat_44077_native.yaml Changelog entry for the enhancement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +168
cmd.Stderr = &stderr
err = cmd.Run()
require.NoError(t, err, "zstd integrity test should pass: %s", stderr.String())
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stderr is reused for both zstd -d and zstd -t without being reset, so failure output from the second command may include stale text from the first. Reset or use a fresh buffer per command to keep failure messages accurate.

Suggested change
cmd.Stderr = &stderr
err = cmd.Run()
require.NoError(t, err, "zstd integrity test should pass: %s", stderr.String())
var stderrTest bytes.Buffer
cmd.Stderr = &stderrTest
err = cmd.Run()
require.NoError(t, err, "zstd integrity test should pass: %s", stderrTest.String())

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +107
require.NoError(b, fe.Shutdown(ctx))
info, _ := os.Stat(fe.conf.Path)
b.ReportMetric(float64(info.Size())/float64(b.N), "output-bytes/op")
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Stat errors are ignored here; if the file wasn't created (or was removed), this will panic on info.Size(). Use require.NoError (or b.Fatal) on the stat call before reporting the metric.

Copilot uses AI. Check for mistakes.
Comment on lines +129 to +133

require.NoError(b, fe.Shutdown(ctx))
info, _ := os.Stat(fe.conf.Path)
b.ReportMetric(float64(info.Size())/float64(b.N), "output-bytes/op")
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: os.Stat errors are ignored; handle the error before dereferencing info to avoid panics and misleading benchmark results.

Copilot uses AI. Check for mistakes.
Comment on lines 6 to +7
"go.uber.org/goleak"
"testing"
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import block is not gofmt-compliant: standard library imports should be grouped before third-party imports (e.g., "testing" should come before go.uber.org/goleak). This may cause generated-code or lint/gofmt checks to fail.

Suggested change
"go.uber.org/goleak"
"testing"
"testing"
"go.uber.org/goleak"

Copilot uses AI. Check for mistakes.
// Supported compression algorithms:`zstd`
Compression string `mapstructure:"compression"`

// CompressionLevel sets the compression level.
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compression_level was added to the config, but the generated JSON schema (config.schema.yaml) does not include this property. Please update/regenerate the schema so config validation and docs reflect the new option; also consider clarifying in the comment/docs that this level only affects native (feature-gated) compression (legacy message-level compression currently ignores it).

Suggested change
// CompressionLevel sets the compression level.
// CompressionLevel sets the compression level for native (feature-gated) compression.
// Legacy message-level compression currently ignores this setting.

Copilot uses AI. Check for mistakes.
Comment on lines 110 to +118
func buildExportFunc(cfg *Config) func(w *fileWriter, buf []byte) error {
if metadata.ExporterFileNativeCompressionFeatureGate.IsEnabled() && cfg.Compression != "" {
// Native compression: the compression stream handles framing, so
// JSON can use newline-delimited output (human-readable after decompression).
// Proto still needs length-prefix for message boundary detection.
if cfg.FormatType == formatTypeJSON {
return exportMessageAsLine
}
return exportMessageAsBuffer
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With native compression enabled, this branch switches JSON output to newline-delimited framing based on cfg.FormatType. If cfg.Encoding is set, FormatType is documented as being overridden by the encoding extension, so the output may not actually be JSON; writing newline-delimited bytes can corrupt binary encodings. Consider forcing length-prefixed framing (or otherwise basing framing on the actual marshaler/encoding) when cfg.Encoding != nil.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +118
// Native compression: the compression stream handles framing, so
// JSON can use newline-delimited output (human-readable after decompression).
// Proto still needs length-prefix for message boundary detection.
if cfg.FormatType == formatTypeJSON {
return exportMessageAsLine
}
return exportMessageAsBuffer
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native-compressed JSON uses exportMessageAsLine, which currently performs two separate writes (payload + "\n"). Since compressingWriter flushes a zstd frame after each Write, this will produce an extra zstd frame per record (and extra overhead). Consider writing the line + newline in a single Write (or adjusting compressingWriter to flush per exported record rather than per low-level write).

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +119
// With native compression + JSON, data should be newline-delimited JSON
br := bufio.NewReader(bytes.NewReader(decompressed))
unmarshaler := &ptrace.JSONUnmarshaler{}
count := 0
for {
buf, isEnd, err := readJSONMessage(br)
require.NoError(t, err)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses readJSONMessage(br) for line splitting, but that helper (in file_exporter_test.go) ignores the isPrefix return and does not propagate non-EOF errors from ReadLine(). That can make the test pass while truncating long lines or silently ignoring read errors. Consider reading with ReadBytes('\n')/ReadString('\n') (or updating the helper) and asserting on the returned error/isPrefix.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filexporter/invalid zstd file

1 participant