-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record wallclock lag histogram #32010
Record wallclock lag histogram #32010
Conversation
7d2926a
to
c15122b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, let me know when it's ready. I left some comments inline.
I'm wondering if we need both period_start
and period_end
. Ideally, the end of the previous period corresponds to the start of the current period.
for (collection_id, collection) in &mut self.collections { | ||
if let Some(stash) = &mut collection.wallclock_lag_histogram_stash { | ||
let lag = collection.shared.lock_write_frontier(|f| frontier_lag(f)); | ||
let bucket = lag.as_secs().next_power_of_two(); | ||
|
||
let key = (histogram_period, bucket); | ||
let key = (histogram_period, bucket, histogram_labels.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clone isn't nice, but I can't think of an easy way to avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole handling of labels in wallclock_lag_histogram_stash
is not great. Label values are fixed to strings right now, which will stop working once we add other types (like bools). Ideally we'd store a Datum
but it's impossible to store an owned String
as a Datum
. I've been considering storing a DatumMap
instead of the BTreeMap
, but I don't know how to construct one from its contents. Should we store a Row
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even a row would need to be cloned, so I think it's fine to leave as-is.
Where it can be useful is when we change the period length/interval. There will be some time where the measurements are glitchy because intervals overlap. Having both the start and end allows us to remove ambiguity in these cases. Though there are no specific plans to actually use the |
This commit adds a new built-in source, `mz_wallclock_global_lag_histogram_raw` that keeps 30 days worth of data by default. No data is filled in yet. The `_raw` suffix suggests that the histogram counts are stored as diffs for efficiency. This makes the histogram harder to query, so this commit also adds a view `mz_wallclock_lag_histogram` that lifts the count into a column.
This commit refactors the existing controller code to refresh wallclock lag introspection in preparation of the addition of a wallclock lag histogram. Mostly it renames things from "wallclock lag" to "wallclock lag history", to avoid confusion when we also add histogram updating logic there. It also changes the structure of the `refresh_wallclock_lag` methods a bit, to make them more similar between the two controllers.
6f2b59c
to
308563a
Compare
This commit extends the two controllers' `refresh_wallclock_lag` methods to also record `WallclockLagHistogram` introspection data. Both the histogram refresh time (i.e. cadence of persistent writes) and the period length can be configured through dyncfg flags. Additionally, the commit adds a dyncfg flag to entirely disable histogram collection, as a failsafe.
This commit adds a "workload_class" label to measurements in `mz_wallclock_global_lag_histogram`. The value of the label is the workload class of the cluster maintaining the respective collection.
308563a
to
64bdcdc
Compare
Should be ready to review now! Things I changed since @antiguru's last look:
|
64bdcdc
to
b977003
Compare
b977003
to
f2dac5e
Compare
TFTR! |
This PR adds a new append-only storage-managed collection,
mz_wallclock_global_lag_histogram
, and makes the controllers start populating it.The schema is:
period_start
timestampperiod_end
timestampobject_id
textlag_seconds
uint8labels
jsonbcount
uint8Histograms are per object and per time period, as specified by
object_id
andperiod_*
, respectively. The time period is configurable through a config flag and defaults to one day.The lag values are bucketed as powers-of-2 seconds. I decided to store seconds instead of milliseconds to (a) avoid the illusion of millisecond accuracy and (b) significantly reduce the number of buckets. I also omitted adding additional precision controls for now to reduce the complexity in this first version and because I suspect that powers of 2 might be sufficient. We can always iterate if it turns out we need more precision. The bucket values are upper bounds, mirroring what we also use for compute introspection histograms.
labels
currently includes a single optionalworkload_class
label, reflecting the workload class of the cluster maintaining the respective object, according tomz_cluster_workload_classes
. I included this label specifically because it provides a simple way to filter out most of the noise when building a metric focused on production experience.Measurements are collected every second but only written once per a configurable refresh interval (one minute by default). This is to reduce the load on persist.
There is also a config flag to disable wallclock lag histogram recording entirely. This exists only to derisk the feature rollout and the expectation is that we'll remove it soon.
Motivation
See proposal in Notion.
Tips for reviewer
Individual commits are meaningful.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.