Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 70 additions & 15 deletions ctats/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ _noun_

OTEL metrics, despite being an invaluable addition to service telemetry,
require an obnoxiously verbose setup and implementation. Ctats isn't
here to provide any new features. Instead in wants to make the current
here to provide any new features. Instead it wants to make the current
features more accessible and less painful.

### Step 1: Init OTEL with Clues
Expand All @@ -36,7 +36,7 @@ func main() {
```go
func main() {
// ...
ctx, err = ctats.Initialize(ctx)
ctx, err = ctats.Initialize(ctx)
// ...
}
```
Expand All @@ -46,11 +46,11 @@ func main() {
```go
func main() {
// We're not kidding, this step is purely optional.
ctx, err := ctats.RegisterHistogram(
ctx, err := ctats.RegisterSum(
ctx,
"http.server.latency", // Name
"ms", // Unit
"New user additions.", // Description
"http.server.requests", // Name
"1", // Unit
"Incoming HTTP requests by status code.", // Description
)
}
```
Expand All @@ -59,10 +59,12 @@ func main() {

```go
func handler(ctx context.Context) {
//...
ctats.Histogram[int64]("http.server.latency").Record(latency)
//...

// ...
ctats.Sum[int64]("http.server.requests").
With("status_code", statusCode).
Inc(ctx)
// ...
}
```

## How it works
Expand Down Expand Up @@ -97,20 +99,71 @@ values are `float64`s behind the scenes. Easier to avoid the problem
of potential conflicts altogether. What, would you prefer that we
panic?

## Corner Case: histogram bucket definitions
## Histograms

Comment thread
ryanfkeepers marked this conversation as resolved.
Histograms are a bit more work than the other types because you have to think
about your data's distribution first. This is because the OTEL histograms
store observations in pre-defined buckets. The default boundaries top out at
**10,000** — anything above that disappears into an overflow bucket.
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated

[This explainer](https://signoz.io/blog/opentelemetry-histogram/) is a good read
if you want a deeper understanding of OTEL Histograms.
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated

You can't define your histogram buckets with Ctats. Why? Because
[OTEL doesn't let you define them at runtime either](https://github.com/open-telemetry/opentelemetry-go/issues/3826).
Comment thread
robertschonfeld marked this conversation as resolved.
You'll have to take it up with the package authors, not us.
However, the `ctats` API is just as easy to use as other metric types. The
cleanest solution is to declare your histogram at startup with
`RegisterHistogram`.
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated

## Sum vs Counter vs Gauge
```go
func main() {
ctx, err := ctats.RegisterHistogram(
ctx,
"op.latency",
"ms",
"End-to-end operation latency.",
ctats.WithBoundaries(ctats.PresetLatencyBoundariesMs...),
)
}

func handler(ctx context.Context) {
ctats.Histogram[int64]("op.latency").Record(ctx, elapsed)
}
```

Still optional though. Pass `WithBoundaries` directly to the `Histogram`
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated
factory and the instrument is created on the first `Record` call. Just keep
in mind that the first creation wins — if the same id was already registered
or recorded against with different boundaries, the new ones are silently
ignored. Again, would you prefer that we panic?
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated

### Picking your boundaries

For latency in milliseconds, `PresetLatencyBoundariesMs` is a sensible
default: 15 logarithmically-spaced buckets from **1 ms to 60,000 ms**, with
finer resolution at the low end where most data clusters.

If your data has a different shape, use `ExponentialBoundaries` to generate
your own range. Note that `min` must be greater than zero — the boundaries
are log-spaced so zero has no meaningful place in the range:

```go
// background job duration in seconds: expected to time out at 1 hour
boundaries := ctats.ExponentialBoundaries(1, 3_600, 10)

ctats.Histogram[int64](
"job.duration",
ctats.WithBoundaries(boundaries...),
).Record(ctx, elapsed)
```
Comment thread
robertschonfeld marked this conversation as resolved.

## Which metric type should I use?

Feeling overwhelmed? Not sure which type to pick? Just answer
these simple questions and you'll be a master in no time!

* Sum -> OTEL Counter
* Counter -> OTEL UpDownCounter
* Gauge -> OTEL Gauge (who knew?)
* Histogram -> OTEL Histogram
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated

Do you need `Delta Temporality`? Use a Sum, it's your only option!

Expand All @@ -119,6 +172,8 @@ Do you need to decrement values? Use a Counter!
Do you need have a single threaded, single source of truth? Try
a Gauge!

Do you need statistics such as percentiles? Use a Histogram!

Sums are the most foolproof option all around. Plug one in,
count away. Counters are nearly as good, if it weren't for the
temporality constraint. For monotonically increasing values,
Expand Down
94 changes: 81 additions & 13 deletions ctats/histogram.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package ctats
import (
"context"
"log"
"math"

"github.com/pkg/errors"
"go.opentelemetry.io/otel/metric"
Expand All @@ -11,12 +12,57 @@ import (
"github.com/alcionai/clues/internal/node"
)

// PresetLatencyBoundariesMs are logarithmically-spaced bucket boundaries from
// 1 to 60_000, suitable for measuring operation latency in milliseconds up to 60s.
var PresetLatencyBoundariesMs = ExponentialBoundaries(1, 60_000, 15)

// ExponentialBoundaries returns count boundaries spaced logarithmically between
// min and max (both inclusive), mirroring Prometheus's ExponentialBucketsRange:
// https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#ExponentialBucketsRange
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not working with prometheus here. Are there any otel docs for the same info?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked to the otel doc about the Explicit Bucket Histogram Aggregation. That docs also refers to being inspired by prometheus which is the source for why we are using logarithmic spacing in the first place.

//
// Example:
//
// ExponentialBoundaries(1, 60_000, 15)
// // → [1 2 5 11 23 51 112 245 537 1179 2588 5679 12461 27344 60000]
func ExponentialBoundaries(min, max float64, count int) []float64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts on this func, now that I've slept on it: first, let's verbify the name. Second, while I don't mind the default scaling, I think it would be appropriate to allow the caller to define their own scaling factor for further control. Any value less <= 1 should use the current default.

Suggested change
func ExponentialBoundaries(min, max float64, count int) []float64 {
func MakeExponentialHistogramBoundaries(min, max, factor float64, count int) []float64 {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added scaling factor with the effect of skewing the boundaries towards the low end of the range. Is this what you had in mind?

Are you comfortable with this function or do we want to go deeper into the maths of what is optimal? I am satisfied with roughly following the logarithmic distribution of the otel default "inspired by prometheus".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be good to test this against app log based calculations which are exact.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be too scientific. After all, this is just a quick way for someone to get an "approximately useful" set of buckets. They can always define their own if it needs to be exact according to some range.

if count < 2 {
return []float64{min, max}
}

factor := math.Pow(max/min, 1/float64(count-1))
b := make([]float64, count)

for i := range b {
b[i] = math.Round(min * math.Pow(factor, float64(i)))
}

b[count-1] = max // guarantee exact ceiling, no rounding drift

return b
}

type histogramCfg struct {
boundaries []float64
}

type HistogramOption func(*histogramCfg)

// WithBoundaries sets explicit bucket boundaries on the histogram.
// Boundaries are passed to the OTel SDK at instrument creation time and are
// ignored if a matching MeterProvider View is already configured.
func WithBoundaries(boundaries ...float64) HistogramOption {
return func(c *histogramCfg) {
c.boundaries = boundaries
}
}

// getOrCreateHistogram attempts to retrieve a histogram from the
// context with the given ID. If it is unable to find a histogram
// with that ID, a new histogram is generated.
func getOrCreateHistogram(
ctx context.Context,
id string,
boundaries []float64,
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated
) (recorder, error) {
id = formatID(id)
b := fromCtx(ctx)
Expand All @@ -36,7 +82,13 @@ func getOrCreateHistogram(
return nil, cluerr.Stack(errNoNodeInCtx)
}

hist, err := nc.OTELMeter().Float64Histogram(id)
var opts []metric.Float64HistogramOption
if len(boundaries) > 0 {
opts = append(opts, metric.WithExplicitBucketBoundaries(boundaries...))
}
Comment thread
ryanfkeepers marked this conversation as resolved.
Outdated
Comment thread
robertschonfeld marked this conversation as resolved.
Outdated

// register the histogram
hist, err := nc.OTELMeter().Float64Histogram(id, opts...)
if err != nil {
return nil, errors.Wrap(err, "making new histogram")
}
Expand All @@ -50,17 +102,19 @@ func getOrCreateHistogram(

// RegisterHistogram introduces a new histogram with the given unit and description.
// If RegisterHistogram is not called before updating a metric value, a histogram with
// no unit or description is created. If RegisterHistogram is called for an ID that
// no unit or description is created. If RegisterHistogram is called for an ID that
// has already been registered, it no-ops.
func RegisterHistogram(
ctx context.Context,
// all lowercase, period delimited id of the histogram. Ex: "http.response.status_code"
// all lowercase, period delimited id of the histogram. Ex: "http.response.size"
id string,
// (optional) the unit of measurement. Ex: "byte", "kB", "fnords"
unit string,
// (optional) a short description about the metric.
// Ex: "number of times we saw the fnords".
description string,
// (optional) histogram specific options
opts ...HistogramOption,
) (context.Context, error) {
id = formatID(id)

Expand All @@ -82,18 +136,26 @@ func RegisterHistogram(
return ctx, errors.New("no clues in ctx")
}

opts := []metric.Float64HistogramOption{}
var cfg histogramCfg
for _, o := range opts {
o(&cfg)
}

var metricHistogramOpts []metric.Float64HistogramOption

if len(description) > 0 {
opts = append(opts, metric.WithDescription(description))
metricHistogramOpts = append(metricHistogramOpts, metric.WithDescription(description))
}

if len(unit) > 0 {
opts = append(opts, metric.WithUnit(unit))
metricHistogramOpts = append(metricHistogramOpts, metric.WithUnit(unit))
}

// register the histogram
hist, err := nc.OTELMeter().Float64Histogram(id, opts...)
if len(cfg.boundaries) > 0 {
metricHistogramOpts = append(metricHistogramOpts, metric.WithExplicitBucketBoundaries(cfg.boundaries...))
}

hist, err := nc.OTELMeter().Float64Histogram(id, metricHistogramOpts...)
if err != nil {
return ctx, errors.Wrap(err, "creating histogram")
}
Expand All @@ -107,17 +169,23 @@ func RegisterHistogram(
// If a Histogram instance has been registered for that ID, the
// registered instance will be used. If not, a new instance
// will get generated.
func Histogram[N number](id string) histogram[N] {
return histogram[N]{base: base{id: formatID(id)}}
func Histogram[N number](id string, opts ...HistogramOption) histogram[N] {
hgm := histogram[N]{base: base{id: formatID(id)}}
for _, o := range opts {
o(&hgm.histogramCfg)
}

return hgm
}

// histogram provides access to the factory functions.
type histogram[N number] struct {
base
histogramCfg
}

func (c histogram[N]) With(kvs ...any) histogram[N] {
return histogram[N]{base: c.with(kvs...)}
return histogram[N]{base: c.with(kvs...), histogramCfg: c.histogramCfg}
}

type recorder interface {
Expand All @@ -128,9 +196,9 @@ type noopRecorder struct{}

func (n noopRecorder) Record(context.Context, float64, ...metric.RecordOption) {}

// Add increments the histogram by n. n can be negative.
// Record records the measurement of n in the histogram.
Comment thread
robertschonfeld marked this conversation as resolved.
func (c histogram[number]) Record(ctx context.Context, n number) {
hist, err := getOrCreateHistogram(ctx, c.getID())
hist, err := getOrCreateHistogram(ctx, c.getID(), c.boundaries)
if err != nil {
log.Printf("err getting histogram: %+v\n", err)
return
Expand Down
Loading