Skip to content

Allow custom bucket boundaries per Histogram#300

Open
robertschonfeld wants to merge 18 commits intoalcionai:mainfrom
robertschonfeld:main
Open

Allow custom bucket boundaries per Histogram#300
robertschonfeld wants to merge 18 commits intoalcionai:mainfrom
robertschonfeld:main

Conversation

@robertschonfeld
Copy link
Copy Markdown

The OTel Go SDK uses explicit bucket boundaries that top out at 10,000. Any observation above that lands in the +Inf overflow bucket, and since Kibana's percentile() uses linear interpolation within buckets it silently maxes out at 10,000. Customizing the bucket boundaries is needed to measure latencies above 10,000

The OTel mechanism is explicit bucket boundaries via metric.WithExplicitBucketBoundaries at instrument creation time — per-instrument, not a global MeterProvider view.

Changes:

  • ExponentialBoundaries(min, max, count) — logarithmically-spaced buckets mirroring Prometheus's ExponentialBucketsRange
  • DefaultLatencyBoundariesMs — 20 buckets from 1–60,000
  • WithBoundaries(...) HistogramOption on Histogram[N] and RegisterHistogram
  • Tests covering the math, option propagation, and end-to-end Record bucket placement via a ManualReader-backed OTel context

Copy link
Copy Markdown
Contributor

@ryanfkeepers ryanfkeepers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments. In particular:

  • unit test fixups.
  • automatic usage of the default boundaries.
  • removal of the histogramConfig struct.

Comment thread ctats/histogram.go Outdated
Comment thread ctats/histogram.go Outdated
// DefaultLatencyBoundariesMs are logarithmically-spaced bucket boundaries from
// 1 to 60_000, suitable for measuring operation latency in milliseconds up to 60s.
// Use with WithBoundaries to avoid the OTel SDK default ceiling of 10,000.
var DefaultLatencyBoundariesMs = ExponentialBoundaries(1, 60_000, 20)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is only getting used in tests, and is not actually applied as the default. See other comments for usage suggestions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how the default boundaries in ctats should be optimized so I decided to keep the OTEL defaults. Making PresetLatencyBoundariesMs default could in theory worsen the precision for someone measuring smaller values. We still could make that change in the future though, especially if we get more experience with modifying the boundaries.

For best results, users must choose their boundaries for each metric based on the expected distribution of its values. After this change, I will update all of our call sites to do so. ctats provides PresetLatencyBoundariesMs and ExponentialBoundaries(min, max float64, count int) as utils for some reasonable boundaries.

Perhaps the naming was confusing so I renamed from DefaultLatencyBoundariesMs to PresetLatencyBoundariesMs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, no, this is not an issue of wording. Any choice of names will produce the same problem. That is, I think you're overfitting to a known problem case in your environment. If we want clues to provide a set of presets, then we're saying that clues knows- and is authoritative about- the best possible histogram layouts for one or more standard scenarios. We could probably come up with a sufficient solution, sure. But at this time I don't see the benefit in taking on that authority.

For now I recommend that we drop this value. If you feel strongly about pursuing the idea further (which is also fine!) then we can do that in a follow-up pr,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed. We will have this as a constant in our clients then.

Comment thread ctats/histogram.go Outdated
Comment thread ctats/histogram.go Outdated
Comment thread ctats/histogram.go Outdated
Comment thread ctats/histogram_test.go
Comment thread ctats/histogram_test.go Outdated
Comment thread ctats/histogram_test.go Outdated
Comment thread ctats/histogram_test.go Outdated
Comment thread ctats/histogram_test.go
Comment thread ctats/histogram.go Outdated

// ExponentialBoundaries returns count boundaries spaced logarithmically between
// min and max (both inclusive), mirroring Prometheus's ExponentialBucketsRange:
// https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#ExponentialBucketsRange
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not working with prometheus here. Are there any otel docs for the same info?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked to the otel doc about the Explicit Bucket Histogram Aggregation. That docs also refers to being inspired by prometheus which is the source for why we are using logarithmic spacing in the first place.

Comment thread ctats/histogram.go Outdated
//
// ExponentialBoundaries(1, 60_000, 15)
// // → [1 2 5 11 23 51 112 245 537 1179 2588 5679 12461 27344 60000]
func ExponentialBoundaries(min, max float64, count int) []float64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts on this func, now that I've slept on it: first, let's verbify the name. Second, while I don't mind the default scaling, I think it would be appropriate to allow the caller to define their own scaling factor for further control. Any value less <= 1 should use the current default.

Suggested change
func ExponentialBoundaries(min, max float64, count int) []float64 {
func MakeExponentialHistogramBoundaries(min, max, factor float64, count int) []float64 {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added scaling factor with the effect of skewing the boundaries towards the low end of the range. Is this what you had in mind?

Are you comfortable with this function or do we want to go deeper into the maths of what is optimal? I am satisfied with roughly following the logarithmic distribution of the otel default "inspired by prometheus".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be good to test this against app log based calculations which are exact.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be too scientific. After all, this is just a quick way for someone to get an "approximately useful" set of buckets. They can always define their own if it needs to be exact according to some range.

Comment thread ctats/histogram.go Outdated
Comment thread ctats/histogram.go Outdated
Comment thread ctats/histogram.go
Comment thread ctats/README.md Outdated
Comment thread ctats/README.md Outdated
Comment thread ctats/README.md Outdated
Comment thread ctats/README.md
Comment thread ctats/README.md Outdated
robertschonfeld and others added 10 commits April 28, 2026 17:49
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Co-authored-by: Keepers <ryan.keepers@veeam.com>
Copy link
Copy Markdown
Contributor

@ryanfkeepers ryanfkeepers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple remaining tiny nits. Thanks for all the effort!

Comment thread ctats/histogram.go
//
// MakeExponentialHistogramBoundaries(10, 1000, 5, 2)
// // → [10 13 32 133 1000] (denser at low end, same range)
func MakeExponentialHistogramBoundaries(min, max float64, count int, scalingFactor float64) []float64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit, since this is getting to be a long line

Suggested change
func MakeExponentialHistogramBoundaries(min, max float64, count int, scalingFactor float64) []float64 {
func MakeExponentialHistogramBoundaries(
min, max float64,
count int,
scalingFactor float64,
) []float64 {

Comment thread ctats/histogram.go
// // → [10 13 32 133 1000] (denser at low end, same range)
func MakeExponentialHistogramBoundaries(min, max float64, count int, scalingFactor float64) []float64 {
if scalingFactor <= 1 {
scalingFactor = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 1 correct? Should this use the old scaling factor evaluation?

(if so, we need a div-by-0 protection, too)

Suggested change
scalingFactor = 1
scalingFactor = math.Pow(max/min, 1/float64(count-1))

Comment thread ctats/histogram.go
boundaries []float64
}

func (c histogramCfg) appendOpts(opts []metric.Float64HistogramOption) []metric.Float64HistogramOption {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cause variadics are just a little nicer for things like these.

Suggested change
func (c histogramCfg) appendOpts(opts []metric.Float64HistogramOption) []metric.Float64HistogramOption {
func (c histogramCfg) appendOpts(opts ...metric.Float64HistogramOption) []metric.Float64HistogramOption {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants