Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record wallclock lag histogram #32010

Merged
merged 5 commits into from
Mar 27, 2025

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Mar 25, 2025

This PR adds a new append-only storage-managed collection, mz_wallclock_global_lag_histogram, and makes the controllers start populating it.

The schema is:

  • period_start timestamp
  • period_end timestamp
  • object_id text
  • lag_seconds uint8
  • labels jsonb
  • count uint8

Histograms are per object and per time period, as specified by object_id and period_*, respectively. The time period is configurable through a config flag and defaults to one day.

The lag values are bucketed as powers-of-2 seconds. I decided to store seconds instead of milliseconds to (a) avoid the illusion of millisecond accuracy and (b) significantly reduce the number of buckets. I also omitted adding additional precision controls for now to reduce the complexity in this first version and because I suspect that powers of 2 might be sufficient. We can always iterate if it turns out we need more precision. The bucket values are upper bounds, mirroring what we also use for compute introspection histograms.

labels currently includes a single optional workload_class label, reflecting the workload class of the cluster maintaining the respective object, according to mz_cluster_workload_classes. I included this label specifically because it provides a simple way to filter out most of the noise when building a metric focused on production experience.

Measurements are collected every second but only written once per a configurable refresh interval (one minute by default). This is to reduce the load on persist.

There is also a config flag to disable wallclock lag histogram recording entirely. This exists only to derisk the feature rollout and the expectation is that we'll remove it soon.

Motivation

  • This PR adds a feature that has not yet been specified.

See proposal in Notion.

Tips for reviewer

Individual commits are meaningful.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the mz_wallclock_lag_histogram branch 2 times, most recently from 7d2926a to c15122b Compare March 26, 2025 13:18
@teskje teskje changed the title [wip] Record wallclock lag histogram Record wallclock lag histogram Mar 27, 2025
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, let me know when it's ready. I left some comments inline.

I'm wondering if we need both period_start and period_end. Ideally, the end of the previous period corresponds to the start of the current period.

for (collection_id, collection) in &mut self.collections {
if let Some(stash) = &mut collection.wallclock_lag_histogram_stash {
let lag = collection.shared.lock_write_frontier(|f| frontier_lag(f));
let bucket = lag.as_secs().next_power_of_two();

let key = (histogram_period, bucket);
let key = (histogram_period, bucket, histogram_labels.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone isn't nice, but I can't think of an easy way to avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole handling of labels in wallclock_lag_histogram_stash is not great. Label values are fixed to strings right now, which will stop working once we add other types (like bools). Ideally we'd store a Datum but it's impossible to store an owned String as a Datum. I've been considering storing a DatumMap instead of the BTreeMap, but I don't know how to construct one from its contents. Should we store a Row?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even a row would need to be cloned, so I think it's fine to leave as-is.

@teskje
Copy link
Contributor Author

teskje commented Mar 27, 2025

I'm wondering if we need both period_start and period_end. Ideally, the end of the previous period corresponds to the start of the current period.

Where it can be useful is when we change the period length/interval. There will be some time where the measurements are glitchy because intervals overlap. Having both the start and end allows us to remove ambiguity in these cases. Though there are no specific plans to actually use the period_end value right now it seems like a good idea to include it to future-proof things. I think once we have the period length locked in we can consider removing the period_end column.

teskje added 2 commits March 27, 2025 11:10
This commit adds a new built-in source,
`mz_wallclock_global_lag_histogram_raw` that keeps 30 days worth of data
by default. No data is filled in yet.

The `_raw` suffix suggests that the histogram counts are stored as diffs
for efficiency. This makes the histogram harder to query, so this commit
also adds a view `mz_wallclock_lag_histogram` that lifts the count into
a column.
This commit refactors the existing controller code to refresh wallclock
lag introspection in preparation of the addition of a wallclock lag
histogram. Mostly it renames things from "wallclock lag" to "wallclock
lag history", to avoid confusion when we also add histogram updating
logic there. It also changes the structure of the
`refresh_wallclock_lag` methods a bit, to make them more similar between
the two controllers.
@teskje teskje force-pushed the mz_wallclock_lag_histogram branch 2 times, most recently from 6f2b59c to 308563a Compare March 27, 2025 10:33
teskje added 2 commits March 27, 2025 11:46
This commit extends the two controllers' `refresh_wallclock_lag` methods
to also record `WallclockLagHistogram` introspection data. Both the
histogram refresh time (i.e. cadence of persistent writes) and the
period length can be configured through dyncfg flags. Additionally, the
commit adds a dyncfg flag to entirely disable histogram collection, as a
failsafe.
This commit adds a "workload_class" label to measurements in
`mz_wallclock_global_lag_histogram`. The value of the label is the
workload class of the cluster maintaining the respective collection.
@teskje teskje force-pushed the mz_wallclock_lag_histogram branch from 308563a to 64bdcdc Compare March 27, 2025 11:26
@teskje
Copy link
Contributor Author

teskje commented Mar 27, 2025

Should be ready to review now! Things I changed since @antiguru's last look:

  • Made the period_* columns the first two columns, after @bkirwi's suggestion that this would make persist order them by time.
  • Renamed the source to mz_wallclock_global_lag_histogram_raw and added a mz_wallclock_global_lag_histogram view that lifts the count into a column. We need this for compatibility with the catalog extractor, which can't run transformations.
  • Renamed the wallclock_lag_histogram_period_length dyncfg to wallclock_lag_histogram_period_interval.
  • Added an enable_wallclock_lag_histogram_collection dyncfg to make it possible to switch of histogram collection in a pinch.
  • Added tests.

@teskje teskje marked this pull request as ready for review March 27, 2025 11:28
@teskje teskje requested review from a team as code owners March 27, 2025 11:28
@teskje teskje requested review from ParkMyCar and antiguru March 27, 2025 11:28
@teskje teskje force-pushed the mz_wallclock_lag_histogram branch from 64bdcdc to b977003 Compare March 27, 2025 11:47
@teskje teskje force-pushed the mz_wallclock_lag_histogram branch from b977003 to f2dac5e Compare March 27, 2025 13:24
@teskje
Copy link
Contributor Author

teskje commented Mar 27, 2025

TFTR!

@teskje teskje merged commit 771f1d8 into MaterializeInc:main Mar 27, 2025
84 checks passed
@teskje teskje deleted the mz_wallclock_lag_histogram branch March 27, 2025 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants