[Exporter/Elasticsearch] Fix HTTP 413 (Payload Too Large) in Elasticsearch exporter #46022

SoumyaRaikwar · 2026-02-11T11:38:34Z

Description

This PR implements automatic request splitting and retry logic for the elasticsearchexporter. When the Elasticsearch server responds with an HTTP 413 (Payload Too Large) error, the exporter now intercepts the response, splits the large bulk request body (NDJSON) into two smaller chunks, and retries them sequentially. This prevents data loss for batches that exceed http.max_content_length.

Link to tracking issue

Fixes #45834

Testing

Added a new unit test TestEsClient_Perform_413_Splitting in exporter/elasticsearchexporter/esclient_test.go.
The test mocks an Elasticsearch backend that returns 413 Request Entity Too Large for the initial request.
Verified that the client correctly splits the payload, retries the chunks, and ultimately succeeds with 200 OK.
Ran make lint and make test locally to ensure no regressions.

Documentation

Added a changelog entry in .chloggen/fix-elasticsearch-413.yaml.

Signed-off-by: SoumyaRaikwar <[email protected]>

mauri870 · 2026-02-11T12:59:15Z

Thank you for working on this! I have a couple questions:

What happens with the metrics, do they work out of the box with this approach? or would we need to aggregate the metrics from each split batch?
What if one half of the split request fails while the other succeeds? We don’t seem to have retries here beyond the transport-level ones. How would we handle document-level errors from one of the _bulk requests?
If we split the batch and it’s still too large, should we split it again? If so, how many times should we retry splitting before giving up?

SoumyaRaikwar · 2026-02-11T13:21:24Z

Thank you for working on this! I have a couple questions:

What happens with the metrics, do they work out of the box with this approach? or would we need to aggregate the metrics from each split batch?

What if one half of the split request fails while the other succeeds? We don’t seem to have retries here beyond the transport-level ones. How would we handle document-level errors from one of the _bulk requests?

If we split the batch and it’s still too large, should we split it again? If so, how many times should we retry splitting before giving up?

@mauri870

The metrics should work out of the box. We aggregate the JSON bodies from the split bulkResponses into a single bulkResponse structure before returning. The upstream caller parses this aggregated response, so it correctly counts the total number of successful and failed items.
If one chunk succeeds and the other fails with a transport error, we return an error for the whole operation. This triggers the Collector's standard retry mechanism for the entire batch. This ensures no data is lost ("at-least-once" delivery).
The recursion is naturally limited. We explicitly stop splitting when a chunk contains fewer than 2 lines. Since the batch size is halved at each step, we reach this base case very quickly, preventing infinite recursion

mauri870

Code-wise, LGTM. I also tested it locally with some custom test cases I had for this, and it seems to work as intended. Unfortunately, I don’t have a deep understanding of the internals, so I'll wait for feedback from the code owners.

carsonip

thanks, a few questions

carsonip · 2026-02-11T14:26:07Z

exporter/elasticsearchexporter/esclient.go


 import (
+	"bytes"
+	"compress/gzip"


q: is the dep swap intentional?

carsonip · 2026-02-11T14:34:25Z

exporter/elasticsearchexporter/esclient.go

+		if gzipErr != nil {
+			return nil, gzipErr
+		}
+		defer gr.Close()


q: can we close gr sooner rather than leaving it open while recursive calls are made?

carsonip · 2026-02-11T14:40:45Z

exporter/elasticsearchexporter/esclient.go

+		content = bodyBytes
+	}
+
+	lines := bytes.Split(content, []byte("\n"))


bytes.Split and bytes.Join memory usage footprint can be high. can we walk the bytes and get index of the middle \n, maybe using a slow and fast pointer algorithm, then split the payload by slicing the byte slice?

lahsivjar · 2026-02-11T14:56:55Z

Just gave a brief look at the PR, and I don't think this is the way we should fix it. There are already conflicts between how ES exporter does things and how exporterhelper does things (take retries for example). A better way to fix this is to emplement exporterhelper.Request interface for the bulk indexer documents and rely on batch settings to configure the batch size so that 413 doesn't happen. I believe we don't need anything to handle 413's explicitly as exporter can be configured based on the ES it is targeting (by configuring the batch sizes).

carsonip

I was talking to @lahsivjar about the HTTP 413 issue and it looks more like a misconfiguration (user issue) on batching config. Let's discuss the problem further in #45834 before jumping to a fix like this which may be a footgun, if we're splitting requests after merging them in a batcher.

SoumyaRaikwar · 2026-02-11T15:54:22Z

I was talking to @lahsivjar about the HTTP 413 issue and it looks more like a misconfiguration (user issue) on batching config. Let's discuss the problem further in #45834 before jumping to a fix like this which may be a footgun, if we're splitting requests after merging them in a batcher.

@lahsivjar @carsonip
sorry for that, I agree that the current recursive splitting approach in this PR is not providing standard OpenTelemetry Collector architectural patterns.
I am going to study the exporterhelper.Request and configuring the batch size approach as suggested. I will discuss the approach in the issue #45834 before proceeding with any further implementation. thanks

Fix HTTP 413 (Payload Too Large) in Elasticsearch exporter

c84574d

Signed-off-by: SoumyaRaikwar <[email protected]>

SoumyaRaikwar requested a review from a team as a code owner February 11, 2026 11:38

SoumyaRaikwar requested a review from dmitryax February 11, 2026 11:38

github-actions bot assigned axw Feb 11, 2026

github-actions bot added the exporter/elasticsearch label Feb 11, 2026

github-actions bot requested review from JaredTan95, carsonip and lahsivjar February 11, 2026 11:38

Merge branch 'main' into fix-elasticsearch-413

2371e6b

mauri870 mentioned this pull request Feb 11, 2026

OTel-based Elasticsearch ingestion can lose data on HTTP 413 errors elastic/elastic-agent#12550

Open

mauri870 approved these changes Feb 11, 2026

View reviewed changes

carsonip reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Exporter/Elasticsearch] Fix HTTP 413 (Payload Too Large) in Elasticsearch exporter #46022

[Exporter/Elasticsearch] Fix HTTP 413 (Payload Too Large) in Elasticsearch exporter #46022

SoumyaRaikwar commented Feb 11, 2026

Uh oh!

mauri870 commented Feb 11, 2026 •

edited

Loading

Uh oh!

SoumyaRaikwar commented Feb 11, 2026

Uh oh!

mauri870 left a comment

Uh oh!

carsonip left a comment

Uh oh!

carsonip Feb 11, 2026

Uh oh!

carsonip Feb 11, 2026

Uh oh!

carsonip Feb 11, 2026

Uh oh!

lahsivjar commented Feb 11, 2026

Uh oh!

carsonip left a comment

Uh oh!

SoumyaRaikwar commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Exporter/Elasticsearch] Fix HTTP 413 (Payload Too Large) in Elasticsearch exporter #46022

Are you sure you want to change the base?

[Exporter/Elasticsearch] Fix HTTP 413 (Payload Too Large) in Elasticsearch exporter #46022

Conversation

SoumyaRaikwar commented Feb 11, 2026

Description

Link to tracking issue

Testing

Documentation

Uh oh!

mauri870 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SoumyaRaikwar commented Feb 11, 2026

Uh oh!

mauri870 left a comment

Choose a reason for hiding this comment

Uh oh!

carsonip left a comment

Choose a reason for hiding this comment

Uh oh!

carsonip Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

carsonip Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

carsonip Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

lahsivjar commented Feb 11, 2026

Uh oh!

carsonip left a comment

Choose a reason for hiding this comment

Uh oh!

SoumyaRaikwar commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mauri870 commented Feb 11, 2026 •

edited

Loading

SoumyaRaikwar commented Feb 11, 2026 •

edited

Loading