-
Notifications
You must be signed in to change notification settings - Fork 822
Add retry/backoff support to Prometheus Remote Write exporter #3986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
Can we not just use the builtin I find this approach preferable to rolling our own backoff-retry loop. |
Thanks for the suggestion. I’m good to move to a requests.Session + HTTPAdapter using urllib3.Retry (POST allowed), mapping our existing knobs, and I’ll add a tiny Retry subclass only to keep jitter/backoff cap. I’ll drop the manual loop and update tests. For context, I initially considered a custom loop to keep full control over jitter/backoff cap and explicit logging and avoid relying on adapter/session setup, but I agree urllib3.Retry is battle-tested and clearer. |
Sounds good, we can get the opinion of other members as I might be in the minority here. On a related note, I'm not sure if sub-classing |
Makes sense. I’ll switch to urllib3.Retry and avoid subclassing if possible: use a requests.Session + HTTPAdapter with Retry(total=..., backoff_factor=..., backoff_max=..., status_forcelist=..., allowed_methods={"POST"}). The requests-bundled urllib3 we have doesn’t expose backoff_jitter, so if we want jitter I’ll add the smallest possible override; otherwise I’ll stick to base Retry with a sensible backoff_max. |
|
The OTLP exports implement retries manually but AFAICS don't expose any tunable (e.g. |
@xrmx |
...orter-prometheus-remote-write/src/opentelemetry/exporter/prometheus_remote_write/__init__.py
Outdated
Show resolved
Hide resolved
…entelemetry/exporter/prometheus_remote_write/__init__.py Co-authored-by: Lukas Hering <[email protected]>
Description
Add configurable retries with exponential backoff/jitter to the Prometheus Remote Write exporter so transient 429/408/5xx and connection/timeouts don’t drop metrics silently. Updated README with the new retry knobs.
Fixes #3985
Type of change
How Has This Been Tested?
Does This PR Require a Core Repo Change?
Checklist: