Improved replica and dataflow expiration #30162

antiguru · 2024-10-23T13:32:58Z

Support expiration of dataflows depending on wall-clock time and with refresh schedules.

This is a partial re-implementation of #29587 to enable more dataflows to participate in expiration. Specifically, it introduces the abstraction of time dependence to describe how a dataflow follows wall-clock time. Using this information, we can then determine how a replica's expiration time relates to a specific dataflow. This allows us to support dataflows that have custom refresh policies.

I'm not sold on the names introduced by this PR, but it's the best I came up with. Open to suggestions!

The implementation deviates from the existing implementation is some important ways:

We do not panic in the dataflow operator that checks for frontier advancements, but rather retain a capability until the dataflow is shut down. This avoids race-condition where dataflow shutdown happens in parallel with dropping the shutdown token, and it avoids needing to reason about what dataflows produce error streams---some have an error output that immediately advances to the empty frontier.
We do not handle the empty frontier in a special way. Previously, we considered advancing to the empty frontier acceptable. However, this makes it difficult to distinguish a shutdown from a source reading the expiration time. In the first case, the operator should drop its capability, in the second it must not for correctness reasons.
We check in the worker thread whether the replica has expired and panic if needed.

There are some problems this PR does not address:

Caching the time dependence information in the physical plans seems like a hack. I think a better place would be the controller. Happy to try this in a follow-up PR.
We need a separate kill-switch to disable the feature because as it is implemented, we capture the expiration time in the controller once per replica. A second kill-switch would enable us to override the expiration to stabilize the system.

Fixes MaterializeInc/database-issues#8688.
Fixes MaterializeInc/database-issues#8683.

Tips for the reviewer

Don't look at individual commits, it's a work log and does not have any semantic meaning.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

shepherdlybot · 2024-10-23T13:33:39Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test 🔍 Detected
(Required) Observability
(Required) QA Review 🔍 Detected
(Required) Run Nightly Tests
Unit Test 🔍 Detected

Risk Summary:

The pull request carries a high risk score of 82, driven by predictors such as the average age of files, cognitive complexity within files, and the delta of executable lines. Historically, PRs with these predictors are 119% more likely to cause a bug than the repository baseline. Additionally, the repository has an increasing trend in observed bugs.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../src/render.rs	97
../controller/instance.rs	100
../src/as_of_selection.rs	98
../inner/create_continual_task.rs	98
../src/coord.rs	99
../src/catalog.rs	93
../src/dataflows.rs	90
../inner/create_materialized_view.rs	91

antiguru · 2024-10-28T15:10:34Z

Nightly tests: https://buildkite.com/materialize/nightly/builds/10176

antiguru · 2024-10-28T15:38:06Z

Asking specific people for feedback on the following parts:

@ggevay for comments on storing the time dependence in the physical plan.
@petrosagg for checking the logic in dataflow_expiration.rs for handling sources and tables.

ParkMyCar

Adapter bits LGTM

src/compute-types/src/dataflows.rs

src/compute-types/src/time_dependence.rs

sdht0

Very nice! Commented on a bunch of observations.

I'm still grokking the refresh schedule logic.

src/adapter/src/optimize/dataflow_expiration.rs

sdht0 · 2024-10-28T16:22:43Z

src/compute-types/src/time_dependence.rs

+        match self {
+            TimeDependence::Indeterminate => None,
+            TimeDependence::RefreshSchedule(schedule, inner) => {
+                let result = inner.iter().map(|inner| inner.apply(wall_clock)).min()??;


A comment explicitly laying out the logic would be nice here.

src/compute/src/compute_state.rs

sdht0 · 2024-10-28T16:50:34Z

src/compute/src/compute_state.rs

+    pub fn determine_dataflow_expiration<P, S>(
+        &self,
+        plan: &DataflowDescription<P, S, mz_repr::Timestamp>,
+    ) -> Antichain<mz_repr::Timestamp> {


Optional, but I think it'd be better to return a more explicit Option<mz_repr::Timestamp> here, as later we almost always have to do self.dataflow_expiration.as_option(). Same for compute_state.replica_expiration`.

Antichain helps with less_than() and meet(), but otherwise it is slightly less transparent in what it is representing, which is exactly an optional Timestamp.

I agree with this take! Seems like some surrounding code would become less awkward if treated expiry times as optional timestamps instead of frontiers.

src/timely-util/src/operator.rs

src/adapter/src/optimize/dataflow_expiration.rs

sdht0

I went over the refresh schedule logic and it looks pretty good!

After adding tests for refresh schedules, this would be good to go!

sdht0 · 2024-10-28T17:53:18Z

test/testdrive/replica-expiration.td

@@ -7,7 +7,7 @@
 # the Business Source License, use of this software will be governed
 # by the Apache License, Version 2.0.

-# Test indexes.
+# Test replica expiration.


Thanks.

How about refresh schedule related tests?

src/timely-util/src/operator.rs

src/adapter/src/optimize/dataflow_expiration.rs

src/timely-util/src/operator.rs

src/compute/src/compute_state.rs

src/adapter/src/optimize/dataflow_expiration.rs

sdht0 · 2024-10-28T21:33:06Z

src/compute-types/src/time_dependence.rs

+    /// Potentially valid for all times.
+    Indeterminate,
+    /// Valid up to a some nested time, rounded according to the refresh schedule.
+    RefreshSchedule(Option<RefreshSchedule>, Vec<Self>),


RefreshSchedule does not really capture what this variant does, which is to capture a tree of nested times, with an optional refresh schedule. Rename to NestedTimes or equivalent?

sdht0 · 2024-10-28T22:25:35Z

src/adapter/src/optimize/dataflow_expiration.rs

+        use TimeDependence::*;
+
+        if let Some(dependence) = self.seen.get(&id).cloned() {
+            return dependence;


(Just an observation, on the apply() side, these common subtrees will be traversed multiple times. If this ever becomes a problem, we could preserve the ids and apply this seen logic even there. Although the reduction/normalization logic will interfere.)

teskje

Not done yet, but posting the first part of my review already.

src/adapter/src/coord.rs

src/adapter/src/coord/sequencer/inner/subscribe.rs

sdht0 · 2024-10-29T15:44:25Z

src/compute-types/src/time_dependence.rs

+///
+/// The default value indicates the dataflow follows wall-clock without modifications.
+#[derive(Debug, Clone, Default, Serialize, Deserialize, Eq, PartialEq, Ord, PartialOrd)]
+pub struct TimeDependence {


I liked the previous implementation more, as it was more explicit. In the new code:

There is no distinction between an "unset" time_dependence (defaults to None) and a time_dependence that was computed to be "indeterminate" (now None). Perhaps there is another way to distinguish between the two cases when debugging?

The wall-clock time represented using the default value is an additional thing to keep in mind while paging in the code.

src/adapter/src/optimize/dataflow_expiration.rs

I don't know how subscribes shut down, deferring to later. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

compute maintenance. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

Remove `Indeterminate` variant and replace it by `None`. Restructure some code. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

for continual task dataflows (not storage collections). Update docs. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

We don't know the exact time semantics continual tasks will have, so disable expiration for now. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru · 2024-10-31T12:21:51Z

Another nightly run: https://buildkite.com/materialize/nightly/builds/10226

ggevay

Still thinking about some things, but sending the comments that I have so far. It's looking good to me.

ggevay · 2024-10-31T13:58:54Z

src/compute-types/src/time_dependence.rs

+            TimeDependence::new(None, vec![TimeDependence::default()]).apply(100.into())
+        );
+
+        // Default refresh schedules refresh never, no wall-clock dependence.


(Btw. it never actually happens that an MV's refresh schedule is Default, because if the user doesn't specify any refresh options, then the MV ends up being a conventional, non-refresh MV.)

ggevay · 2024-10-31T14:07:47Z

src/adapter/src/optimize/dataflow_expiration.rs

+//! * A meet of anything but wall-clock time and a refresh schedule results in a refresh schedule
+//!   that depends on the deduplicated collection of dependencies.
+//! * Otherwise, a dataflow is indeterminate, which expresses that we either don't know how it
+//!   follows wall-clock time, or is a constant collection.


You could add a note here that if there are both determinate and indeterminate TimeDependences, then it's ok whatever the actual, real time dependence of the indeterminate ones are:

It's ok if the indeterminate ones have a later frontier than the determinate ones, because then the frontier will be determined by the determinate ones.

It's also ok if the indeterminate ones have an earlier frontier than the determinate ones. This will mean that the dataflow will expire later than ideal, but it won't cause a panic or wrong results.

ggevay · 2024-10-31T14:10:16Z

src/adapter/src/optimize/dataflow_expiration.rs

+//! The time dependence needs to be computed on the actual dependencies, and not on catalog
+//! uses. An optimized dataflow depends on concrete indexes, and has unnecessary dependencies
+//! pruned. Additionally, transitive dependencies can depend on indexes that do not exist anymore,
+//! which makes combining run-time information with catalog-based information inconclusive.


(Note that in theory, it could happen that an MIR plan involves a GlobalId that actually happens to be "dead code" in the MIR plan. In this case, DataflowDescription::import_ids would still list this dependency, but the dataflow's actual frontier wouldn't reflect this dependency. In practice, this shouldn't happen currently, because one of the last things that we do with MIR plans is call NormalizeLets, which removes CTEs that are not referenced.)

ggevay · 2024-10-31T14:16:34Z

src/compute-types/src/time_dependence.rs

+    use super::*;
+
+    #[mz_ore::test]
+    fn test_time_dependence_normalize() {


In both of the tests in test_time_dependence_normalize, normalize doesn't have any actual work to do, i.e., the struct is unchanged. (But normalize is pretty simple, so it's not a big problem.)

ggevay · 2024-10-31T14:20:08Z

src/adapter/src/coord/sequencer/inner/create_materialized_view.rs

+                let (mut df_desc, df_meta) = global_lir_plan.unapply();
+
+                df_desc.time_dependence =
+                    time_dependence(coord.catalog(), df_desc.import_ids(), refresh_schedule);


Could these calls to time_dependence be moved into the final optimize call, which returns this df_desc? Then this call wouldn't need to be duplicated between bootstrapping and sequencing.

If I interpret this correctly, you're suggestion to move the time_dependence call into the optimizer? I looked into this, but the optimizer doesn't have access to the catalog (it only has an OptimizerCatalog, which doesn't expose access to the plans). This, and we're using it for continual tasks, where we don't want the feature to be enabled.

That said, I'll file a follow-up issue to move the time dependence determination into the controller, which would avoid the whole issue wholesale.

teskje

I didn't review again in detail, but my one blocking concern about CTs has been addressed, so I'm good with merging this now and addressing the style comments later.

ggevay

I'm also ok with merging as it is, and addressing the minor comment later.

def-

All nightly failures are also in main (some are fixed already, but no need to rebase)

antiguru · 2024-10-31T16:05:00Z

Thanks for the reviews!

antiguru requested review from a team as code owners October 23, 2024 13:32

antiguru requested a review from ParkMyCar October 23, 2024 13:33

antiguru marked this pull request as draft October 23, 2024 13:33

antiguru force-pushed the replica_expiration branch 6 times, most recently from 701e64c to 1d206d2 Compare October 28, 2024 12:01

antiguru force-pushed the replica_expiration branch from 7b5f3c4 to b883d72 Compare October 28, 2024 15:25

antiguru changed the title ~~DNM WIP replica expiration~~ Improved replica and dataflow expiration Oct 28, 2024

antiguru marked this pull request as ready for review October 28, 2024 15:36

antiguru requested a review from a team as a code owner October 28, 2024 15:36

antiguru requested review from teskje and sdht0 October 28, 2024 15:36

antiguru requested review from petrosagg and ggevay October 28, 2024 15:38

ParkMyCar approved these changes Oct 28, 2024

View reviewed changes

src/compute-types/src/dataflows.rs Show resolved Hide resolved

src/compute-types/src/time_dependence.rs Outdated Show resolved Hide resolved

sdht0 reviewed Oct 28, 2024

View reviewed changes

sdht0 mentioned this pull request Oct 28, 2024

Support dataflow expiration for materialized views with REFRESH #29971

Closed

5 tasks

sdht0 reviewed Oct 28, 2024

View reviewed changes

antiguru force-pushed the replica_expiration branch 2 times, most recently from 4e5e1ed to 09dc8b2 Compare October 29, 2024 12:46

teskje reviewed Oct 29, 2024

View reviewed changes

src/adapter/src/coord.rs Outdated Show resolved Hide resolved

src/adapter/src/coord/sequencer/inner/subscribe.rs Outdated Show resolved Hide resolved

sdht0 reviewed Oct 29, 2024

View reviewed changes

antiguru force-pushed the replica_expiration branch from 09dc8b2 to a12af6e Compare October 30, 2024 09:55

antiguru added 18 commits October 31, 2024 10:48

Disable for subscribes

9c26b1c

I don't know how subscribes shut down, deferring to later. Signed-off-by: Moritz Hoffmann <[email protected]>

Correctly support tables on load generator

4cf8428

Signed-off-by: Moritz Hoffmann <[email protected]>

Retain shutdown token

41959e0

Signed-off-by: Moritz Hoffmann <[email protected]>

Fix MVs with refresh schedule, don't expire err streams

831e74e

Signed-off-by: Moritz Hoffmann <[email protected]>

Print warning instead of panicking

46c65f8

Signed-off-by: Moritz Hoffmann <[email protected]>

Relax remaining offset check. Negative numbers are acceptable

d02a63f

Signed-off-by: Moritz Hoffmann <[email protected]>

Restructure, cleanup, warn on reaching expiration frontier, panic in

338ca5e

compute maintenance. Signed-off-by: Moritz Hoffmann <[email protected]>

Oops, inverted panic condition!

4c5ed04

Signed-off-by: Moritz Hoffmann <[email protected]>

Fix tests

8046d40

Signed-off-by: Moritz Hoffmann <[email protected]>

Fix findings

36e48a2

Signed-off-by: Moritz Hoffmann <[email protected]>

Address review comments

2ec45cf

Remove `Indeterminate` variant and replace it by `None`. Restructure some code. Signed-off-by: Moritz Hoffmann <[email protected]>

TimeDependence is a struct

1a03d65

Signed-off-by: Moritz Hoffmann <[email protected]>

TimeDependence debug implementation

8b95002

Signed-off-by: Moritz Hoffmann <[email protected]>

Remove unused function

b39174b

Signed-off-by: Moritz Hoffmann <[email protected]>

Remove helper type, cleanup, apply tests

9a12b69

Signed-off-by: Moritz Hoffmann <[email protected]>

Address feedback: Enable expiration for subscribes, only set expiration

a735903

for continual task dataflows (not storage collections). Update docs. Signed-off-by: Moritz Hoffmann <[email protected]>

Special-case log indexes

500eb63

Signed-off-by: Moritz Hoffmann <[email protected]>

Disable for continual tasks

4cd60c2

We don't know the exact time semantics continual tasks will have, so disable expiration for now. Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru force-pushed the replica_expiration branch from 5f9194d to 4cd60c2 Compare October 31, 2024 10:01

Replica expiration pattern

d12d3fe

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru force-pushed the replica_expiration branch from 42e8028 to d12d3fe Compare October 31, 2024 10:55

ggevay reviewed Oct 31, 2024

View reviewed changes

teskje approved these changes Oct 31, 2024

View reviewed changes

ggevay approved these changes Oct 31, 2024

View reviewed changes

def- approved these changes Oct 31, 2024

View reviewed changes

antiguru merged commit 24d2fc2 into MaterializeInc:main Oct 31, 2024
213 of 222 checks passed

antiguru deleted the replica_expiration branch October 31, 2024 16:04

antiguru restored the replica_expiration branch November 19, 2024 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved replica and dataflow expiration #30162

Improved replica and dataflow expiration #30162

antiguru commented Oct 23, 2024 •

edited

Loading

shepherdlybot bot commented Oct 23, 2024 •

edited

Loading

antiguru commented Oct 28, 2024

antiguru commented Oct 28, 2024 •

edited

Loading

ParkMyCar left a comment

sdht0 left a comment •

edited

Loading

sdht0 Oct 28, 2024

sdht0 Oct 28, 2024 •

edited

Loading

teskje Oct 30, 2024

sdht0 left a comment

sdht0 Oct 28, 2024

sdht0 Oct 28, 2024

sdht0 Oct 28, 2024 •

edited

Loading

teskje left a comment

sdht0 Oct 29, 2024

antiguru commented Oct 31, 2024

ggevay left a comment

ggevay Oct 31, 2024 •

edited

Loading

ggevay Oct 31, 2024

ggevay Oct 31, 2024

ggevay Oct 31, 2024

ggevay Oct 31, 2024

antiguru Oct 31, 2024

ggevay Oct 31, 2024

teskje left a comment

ggevay left a comment

def- left a comment

antiguru commented Oct 31, 2024

Improved replica and dataflow expiration #30162

Improved replica and dataflow expiration #30162

Conversation

antiguru commented Oct 23, 2024 • edited Loading

Tips for the reviewer

Checklist

shepherdlybot bot commented Oct 23, 2024 • edited Loading

Mitigations

antiguru commented Oct 28, 2024

antiguru commented Oct 28, 2024 • edited Loading

ParkMyCar left a comment

Choose a reason for hiding this comment

sdht0 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdht0 Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdht0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdht0 Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

teskje left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antiguru commented Oct 31, 2024

ggevay left a comment

Choose a reason for hiding this comment

ggevay Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teskje left a comment

Choose a reason for hiding this comment

ggevay left a comment

Choose a reason for hiding this comment

def- left a comment

Choose a reason for hiding this comment

antiguru commented Oct 31, 2024

antiguru commented Oct 23, 2024 •

edited

Loading

shepherdlybot bot commented Oct 23, 2024 •

edited

Loading

antiguru commented Oct 28, 2024 •

edited

Loading

sdht0 left a comment •

edited

Loading

sdht0 Oct 28, 2024 •

edited

Loading

sdht0 Oct 28, 2024 •

edited

Loading

ggevay Oct 31, 2024 •

edited

Loading