Allow custom bucket boundaries per Histogram#300
Allow custom bucket boundaries per Histogram#300robertschonfeld wants to merge 18 commits intoalcionai:mainfrom
Conversation
ryanfkeepers
left a comment
There was a problem hiding this comment.
Please see comments. In particular:
- unit test fixups.
- automatic usage of the default boundaries.
- removal of the histogramConfig struct.
| // DefaultLatencyBoundariesMs are logarithmically-spaced bucket boundaries from | ||
| // 1 to 60_000, suitable for measuring operation latency in milliseconds up to 60s. | ||
| // Use with WithBoundaries to avoid the OTel SDK default ceiling of 10,000. | ||
| var DefaultLatencyBoundariesMs = ExponentialBoundaries(1, 60_000, 20) |
There was a problem hiding this comment.
This value is only getting used in tests, and is not actually applied as the default. See other comments for usage suggestions.
There was a problem hiding this comment.
It's not clear how the default boundaries in ctats should be optimized so I decided to keep the OTEL defaults. Making PresetLatencyBoundariesMs default could in theory worsen the precision for someone measuring smaller values. We still could make that change in the future though, especially if we get more experience with modifying the boundaries.
For best results, users must choose their boundaries for each metric based on the expected distribution of its values. After this change, I will update all of our call sites to do so. ctats provides PresetLatencyBoundariesMs and ExponentialBoundaries(min, max float64, count int) as utils for some reasonable boundaries.
Perhaps the naming was confusing so I renamed from DefaultLatencyBoundariesMs to PresetLatencyBoundariesMs.
There was a problem hiding this comment.
Mmm, no, this is not an issue of wording. Any choice of names will produce the same problem. That is, I think you're overfitting to a known problem case in your environment. If we want clues to provide a set of presets, then we're saying that clues knows- and is authoritative about- the best possible histogram layouts for one or more standard scenarios. We could probably come up with a sufficient solution, sure. But at this time I don't see the benefit in taking on that authority.
For now I recommend that we drop this value. If you feel strongly about pursuing the idea further (which is also fine!) then we can do that in a follow-up pr,
There was a problem hiding this comment.
Ok, removed. We will have this as a constant in our clients then.
|
|
||
| // ExponentialBoundaries returns count boundaries spaced logarithmically between | ||
| // min and max (both inclusive), mirroring Prometheus's ExponentialBucketsRange: | ||
| // https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#ExponentialBucketsRange |
There was a problem hiding this comment.
We're not working with prometheus here. Are there any otel docs for the same info?
There was a problem hiding this comment.
Linked to the otel doc about the Explicit Bucket Histogram Aggregation. That docs also refers to being inspired by prometheus which is the source for why we are using logarithmic spacing in the first place.
| // | ||
| // ExponentialBoundaries(1, 60_000, 15) | ||
| // // → [1 2 5 11 23 51 112 245 537 1179 2588 5679 12461 27344 60000] | ||
| func ExponentialBoundaries(min, max float64, count int) []float64 { |
There was a problem hiding this comment.
Two thoughts on this func, now that I've slept on it: first, let's verbify the name. Second, while I don't mind the default scaling, I think it would be appropriate to allow the caller to define their own scaling factor for further control. Any value less <= 1 should use the current default.
| func ExponentialBoundaries(min, max float64, count int) []float64 { | |
| func MakeExponentialHistogramBoundaries(min, max, factor float64, count int) []float64 { |
There was a problem hiding this comment.
Added scaling factor with the effect of skewing the boundaries towards the low end of the range. Is this what you had in mind?
Are you comfortable with this function or do we want to go deeper into the maths of what is optimal? I am satisfied with roughly following the logarithmic distribution of the otel default "inspired by prometheus".
There was a problem hiding this comment.
Will be good to test this against app log based calculations which are exact.
There was a problem hiding this comment.
I don't think we need to be too scientific. After all, this is just a quick way for someone to get an "approximately useful" set of buckets. They can always define their own if it needs to be exact according to some range.
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
ryanfkeepers
left a comment
There was a problem hiding this comment.
A couple remaining tiny nits. Thanks for all the effort!
| // | ||
| // MakeExponentialHistogramBoundaries(10, 1000, 5, 2) | ||
| // // → [10 13 32 133 1000] (denser at low end, same range) | ||
| func MakeExponentialHistogramBoundaries(min, max float64, count int, scalingFactor float64) []float64 { |
There was a problem hiding this comment.
style nit, since this is getting to be a long line
| func MakeExponentialHistogramBoundaries(min, max float64, count int, scalingFactor float64) []float64 { | |
| func MakeExponentialHistogramBoundaries( | |
| min, max float64, | |
| count int, | |
| scalingFactor float64, | |
| ) []float64 { |
| // // → [10 13 32 133 1000] (denser at low end, same range) | ||
| func MakeExponentialHistogramBoundaries(min, max float64, count int, scalingFactor float64) []float64 { | ||
| if scalingFactor <= 1 { | ||
| scalingFactor = 1 |
There was a problem hiding this comment.
is 1 correct? Should this use the old scaling factor evaluation?
(if so, we need a div-by-0 protection, too)
| scalingFactor = 1 | |
| scalingFactor = math.Pow(max/min, 1/float64(count-1)) |
| boundaries []float64 | ||
| } | ||
|
|
||
| func (c histogramCfg) appendOpts(opts []metric.Float64HistogramOption) []metric.Float64HistogramOption { |
There was a problem hiding this comment.
nit: cause variadics are just a little nicer for things like these.
| func (c histogramCfg) appendOpts(opts []metric.Float64HistogramOption) []metric.Float64HistogramOption { | |
| func (c histogramCfg) appendOpts(opts ...metric.Float64HistogramOption) []metric.Float64HistogramOption { |
The OTel Go SDK uses explicit bucket boundaries that top out at 10,000. Any observation above that lands in the
+Infoverflow bucket, and since Kibana'spercentile()uses linear interpolation within buckets it silently maxes out at 10,000. Customizing the bucket boundaries is needed to measure latencies above 10,000The OTel mechanism is explicit bucket boundaries via
metric.WithExplicitBucketBoundariesat instrument creation time — per-instrument, not a global MeterProvider view.Changes:
ExponentialBoundaries(min, max, count)— logarithmically-spaced buckets mirroring Prometheus'sExponentialBucketsRangeDefaultLatencyBoundariesMs— 20 buckets from 1–60,000WithBoundaries(...) HistogramOptiononHistogram[N]andRegisterHistogramRecordbucket placement via aManualReader-backed OTel context