Skip to content

Commit 67e8790

Browse files
authored
Merge pull request #28138 from ggevay/rehydration-to-hydration
Change `REHYDRATION TIME ESTIMATE` to `HYDRATION TIME ESTIMATE`
2 parents bc55c20 + 9895bdb commit 67e8790

File tree

24 files changed

+140
-132
lines changed

24 files changed

+140
-132
lines changed

doc/developer/design/20231027_refresh_mvs.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
- Slack:
1414
- [channel #wg-tuning-freshness](https://materializeinc.slack.com/archives/C06535JL58R/p1699395646085619)
1515
- [big design thread in #epd-sql-council](https://materializeinc.slack.com/archives/C063H5S7NKE/p1699543250405409)
16-
- [`REHYDRATION TIME ESTIMATE` thread in #epd-sql-council](https://materializeinc.slack.com/archives/C063H5S7NKE/p1712165305916299)
16+
- [`HYDRATION TIME ESTIMATE` thread in #epd-sql-council](https://materializeinc.slack.com/archives/C063H5S7NKE/p1712165305916299)
1717
- Notion:
1818
- [Tuning REFRESH on MVs: UX](https://www.notion.so/materialize/Tuning-REFRESH-on-MVs-UX-1abbf85683364a1d997d77d7022ccd4f)
1919
- [Compute meeting on automatic cluster scheduling](https://www.notion.so/materialize/Compute-meeting-on-automatic-cluster-scheduling-ce353b8af52e449d8784241c4a1c0585)
@@ -183,7 +183,7 @@ There is a workaround for this until we properly fix it: If there is a specific
183183
184184
A proper fix would be to start up the replica a bit before the exact moment of the refresh, so that it can rehydrate already. For example, let's say we have an MV that is to be updated at every midnight. If we know that a refresh will take approximately 1 hour, then we can start up the replica at, say, 10:50 PM, so that it will be rehydrated by about 11:50 PM. At this point, most of the Compute processing that is needed for the refresh has already happened. Now the replica just needs to process the last 10 minutes of input data until midnight at a normal pace. We let the replica run until the MV's upper passes midnight (and jumps to the next midnight), which should happen within a few seconds after midnight. Note that before midnight, queries against the MV will still read the old state (as they should), because the new data is written at timestamps rounded up to midnight.
185185
186-
How do we know how much earlier than the refresh time should we turn on the replica, that is, how much time the refresh will take? In the first version of this feature, we can let the user set this explicitly by something like `REFRESH EVERY <interval> EARLY <interval>`. Later, we should record the times the refreshes take, and infer the time requirement of the next refresh based on earlier ones. Note that this will be complicated by the fact that we have different instance types [that have wildly differing CPU performance](https://materializeinc.slack.com/archives/CM7ATT65S/p1697816482502819). Update: This is now set on the auto-scheduled cluster, with the `REHYDRATION TIME ESTIMATE` syntax.
186+
How do we know how much earlier than the refresh time should we turn on the replica, that is, how much time the refresh will take? In the first version of this feature, we can let the user set this explicitly by something like `REFRESH EVERY <interval> EARLY <interval>`. Later, we should record the times the refreshes take, and infer the time requirement of the next refresh based on earlier ones. Note that this will be complicated by the fact that we have different instance types [that have wildly differing CPU performance](https://materializeinc.slack.com/archives/CM7ATT65S/p1697816482502819). Update: This is now set on the auto-scheduled cluster, with the `HYDRATION TIME ESTIMATE` syntax.
187187
188188
### Logical Times vs. Wall Clock Times
189189
@@ -253,14 +253,14 @@ As mentioned in the [scoping section](#out-of-scope), an alternative implementat
253253

254254
E.g.:
255255
```
256-
ALTER CLUSTER c1 SET (SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '1 hour'));
256+
ALTER CLUSTER c1 SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour'));
257257
```
258258
259259
Discussions:
260260
- [Original discussion](https://github.com/MaterializeInc/materialize/issues/25712)
261261
- [Overview](https://www.notion.so/materialize/REFRESH-user-docs-draft-4a8f30b737a94619ac9f645abc9f84ce?pvs=4#025fd5733fcd4f38b48ee967bc8fb763)
262262
- [Syntax discussion](https://materializeinc.slack.com/archives/C063H5S7NKE/p1710355545343079)
263-
- [REHYDRATION TIME ESTIMATE discussion](https://materializeinc.slack.com/archives/C063H5S7NKE/p1712165305916299)
263+
- [HYDRATION TIME ESTIMATE discussion](https://materializeinc.slack.com/archives/C063H5S7NKE/p1712165305916299)
264264
265265
## Introspection / Observability
266266
@@ -276,13 +276,13 @@ The values in `mz_materialized_view_refreshes` will be calculated as follows:
276276
277277
For seeing whether the last refresh is being late, the user can run `EXPLAIN TIMESTAMP` in `STRICT SERIALIZABLE` mode, and look at can respond immediately. If it's false, then the last refresh's completion is overdue. Another way to get the same information would be to check if `mz_materialized_view_refreshes.next_refresh < now()`.
278278
279-
For showing cluster schedules, I'll create a table in `mz_internal` called `mz_cluster_schedules`. This will be similar to `mz_materialized_view_refresh_strategies` in that it will allow for multiple `SCHEDULE =` options on a cluster by having one row for each schedule option of each cluster. This would currently be only either `SCHEDULE = ON REFRESH` or `SCHEDULE = MANUAL`. Columns would be `(cluster_id text, type text, refresh_rehydration_time_estimate interval)`. In `type`, we would currently have either "manual" (the default), or "on-refresh". (Eventually, we'll probably also want a `next_scheduled_turn_on`, but this doesn't seem so urgent. It will get more important when we'll be choosing the warmup time automatically.)
279+
For showing cluster schedules, I'll create a table in `mz_internal` called `mz_cluster_schedules`. This will be similar to `mz_materialized_view_refresh_strategies` in that it will allow for multiple `SCHEDULE =` options on a cluster by having one row for each schedule option of each cluster. This would currently be only either `SCHEDULE = ON REFRESH` or `SCHEDULE = MANUAL`. Columns would be `(cluster_id text, type text, refresh_hydration_time_estimate interval)`. In `type`, we would currently have either "manual" (the default), or "on-refresh". (Eventually, we'll probably also want a `next_scheduled_turn_on`, but this doesn't seem so urgent. It will get more important when we'll be choosing the warmup time automatically.)
280280
281281
For seeing whether a cluster is currently turned on, the user can simply look at `mz_cluster_replicas`, because we currently turn clusters On/Off by just creating/dropping replicas. We might also add a builtin view for showing this information in a more focused way.
282282
283283
For the automatic cluster scheduling history, the user can look at `mz_audit_events`. This has a `details` column, which is a JSON blob, where I'm planning to add the `reason` for turning on a cluster, i.e., which materialized views were in need of a refresh. (See Nikhil's comment [here](https://github.com/MaterializeInc/materialize/pull/26401#pullrequestreview-1981986544).) There is also the `mz_cluster_replica_history` view, which takes its info from `mz_audit_events`, and presents the info in a nicer form. I could add a new reason column to this view. Also note that the `reason` could also be prepared to show reasons from other policies: it could itself be a collection of key-value pairs, where the keys are policy names (e.g., refresh), and the values have policy-specific structures. For refresh, it could be a list of the materialized view IDs that made us turn the cluster on.
284284
285-
We'll also want to show rehydration times from the last several refreshes, to help users set the `REHYDRATION TIME ESTIMATE` of clusters. I'm thinking to create a new table `mz_internal.mz_compute_hydration_history (replica_id text, rehydration_time interval)`, which would have one row for each replica creation, and it would show the time it took to rehydrate the replica when it was created. (The user can join this with `mz_cluster_replica_history` to know which cluster the replica belonged to, replica size, etc.) Btw. this doesn't need to be constrained to clusters involving `REFRESH` MVs; this info seems generally useful for any compute cluster. If we want to make it even more useful generally, we might want to add one row not just for each replica creation, but also each replica restart, so that we'll show the rehydrations that happen at system upgrade restarts. In this case, we'll probably need also a `time` column, and then `(replica_id, time)` would be a composite key. For this general version, we might have to truncate the relation to keep it from growing too big.
285+
We'll also want to show rehydration times from the last several refreshes, to help users set the `HYDRATION TIME ESTIMATE` of clusters. I'm thinking to create a new table `mz_internal.mz_compute_hydration_history (replica_id text, rehydration_time interval)`, which would have one row for each replica creation, and it would show the time it took to rehydrate the replica when it was created. (The user can join this with `mz_cluster_replica_history` to know which cluster the replica belonged to, replica size, etc.) Btw. this doesn't need to be constrained to clusters involving `REFRESH` MVs; this info seems generally useful for any compute cluster. If we want to make it even more useful generally, we might want to add one row not just for each replica creation, but also each replica restart, so that we'll show the rehydrations that happen at system upgrade restarts. In this case, we'll probably need also a `time` column, and then `(replica_id, time)` would be a composite key. For this general version, we might have to truncate the relation to keep it from growing too big.
286286
287287
We might want to also track the time it takes to actually perform a refresh, assuming that the replica is already hydrated. This will often take <1 sec, but if the MV's storage is big and/or there are many changes, then it might take more.
288288

doc/user/content/sql/alter-cluster.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ ALTER CLUSTER c1 SET (SIZE '100cc');
4040
{{< private-preview />}}
4141

4242
```sql
43-
ALTER CLUSTER c1 SET (SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '1 hour'));
43+
ALTER CLUSTER c1 SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour'));
4444
```
4545

4646
See the reference documentation for [`CREATE CLUSTER`](../create-cluster/#scheduling)

doc/user/content/sql/create-cluster.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ you can configure a cluster to automatically turn on and off using the
244244
```mzsql
245245
CREATE CLUSTER my_scheduled_cluster (
246246
SIZE = '3200cc',
247-
SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '1 hour')
247+
SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour')
248248
);
249249
```
250250

@@ -267,17 +267,17 @@ To re-enable scheduling:
267267

268268
```mzsql
269269
ALTER CLUSTER my_scheduled_cluster
270-
SET (SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '1 hour'));
270+
SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour'));
271271
```
272272

273-
#### Rehydration time estimate
273+
#### Hydration time estimate
274274

275-
<p style="font-size:14px"><b>Syntax:</b> <code>REHYDRATION TIME ESTIMATE</code> <i>interval</i></p>
275+
<p style="font-size:14px"><b>Syntax:</b> <code>HYDRATION TIME ESTIMATE</code> <i>interval</i></p>
276276

277277
By default, scheduled clusters will turn on at the scheduled refresh time. To
278278
avoid [unavailability of the objects scheduled for refresh](/sql/create-materialized-view/#querying-materialized-views-with-refresh-strategies) during the refresh
279279
operation, we recommend turning the cluster on ahead of the scheduled time to
280-
allow rehydration to complete. This can be controlled using the `REHYDRATION
280+
allow rehydration to complete. This can be controlled using the `HYDRATION
281281
TIME ESTIMATE` clause.
282282

283283
#### Introspection
@@ -290,7 +290,7 @@ system catalog table:
290290
SELECT c.id AS cluster_id,
291291
c.name AS cluster_name,
292292
cs.type AS schedule_type,
293-
cs.refresh_rehydration_time_estimate
293+
cs.refresh_hydration_time_estimate
294294
FROM mz_internal.mz_cluster_schedules cs
295295
JOIN mz_clusters c ON cs.cluster_id = c.id
296296
WHERE c.name = 'my_refresh_cluster';

doc/user/content/sql/create-materialized-view.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ have a period of unavailability around the scheduled refresh times — during
244244
this period, the view **will not return any results**. To avoid unavailability
245245
during the refresh operation, we recommend hosting these views in
246246
[**scheduled clusters**](/sql/create-cluster/#scheduling) configured to
247-
automatically [turn on ahead of the scheduled refresh time](/sql/create-cluster/#rehydration-time-estimate).
247+
automatically [turn on ahead of the scheduled refresh time](/sql/create-cluster/#hydration-time-estimate).
248248

249249
**Example**
250250

@@ -254,7 +254,7 @@ refresh times:
254254
```mzsql
255255
CREATE CLUSTER my_scheduled_cluster (
256256
SIZE = '3200cc',
257-
SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '1 hour')
257+
SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour')
258258
);
259259
```
260260

@@ -283,18 +283,18 @@ backfill the view with pre-existing data — a process known as [_hydration_](/t
283283
of the view** to just the duration of the refresh.
284284

285285
If the cluster is **not** configured to turn on ahead of scheduled refreshes
286-
(i.e., using the `REHYDRATION TIME ESTIMATE` option), the total unavailability
286+
(i.e., using the `HYDRATION TIME ESTIMATE` option), the total unavailability
287287
window of the view will be a combination of the hydration time for all objects
288288
in the cluster (typically long) and the duration of the refresh for the
289289
materialized view (typically short).
290290

291291
Depending on the actual time it takes to hydrate the view or set of views in the
292-
cluster, you can later adjust the rehydration time estimate value for the
292+
cluster, you can later adjust the hydration time estimate value for the
293293
cluster using [`ALTER CLUSTER`](../alter-cluster/#schedule):
294294

295295
```mzsql
296296
ALTER CLUSTER my_scheduled_cluster
297-
SET (SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '30 minutes'));
297+
SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '30 minutes'));
298298
```
299299

300300
#### Introspection

doc/user/content/sql/system-catalog/mz_internal.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,11 +135,11 @@ the most recent status for each AWS PrivateLink connection in the system.
135135
The `mz_cluster_schedules` table shows the `SCHEDULE` option specified for each cluster.
136136

137137
<!-- RELATION_SPEC mz_internal.mz_cluster_schedules -->
138-
| Field | Type | Meaning |
139-
|-------------------------------------|--------------|---------------------------------------------------------------|
140-
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters).|
141-
| `type` | [`text`] | `on-refresh`, or `manual`. Default: `manual` |
142-
| `refresh_rehydration_time_estimate` | [`interval`] | The interval given in the `REHYDRATION TIME ESTIMATE` option. |
138+
| Field | Type | Meaning |
139+
|-------------------------------------|--------------|----------------------------------------------------------------|
140+
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). |
141+
| `type` | [`text`] | `on-refresh`, or `manual`. Default: `manual` |
142+
| `refresh_hydration_time_estimate` | [`interval`] | The interval given in the `HYDRATION TIME ESTIMATE` option. |
143143

144144
## `mz_cluster_replica_frontiers`
145145

misc/dbt-materialize/dbt/include/materialize/macros/ci/create_cluster.sql

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,20 +21,20 @@ This macro creates a cluster with the specified properties.
2121
- size (str): The size of the cluster. This parameter is required.
2222
- replication_factor (int, optional): The replication factor for the cluster. Only applicable when schedule_type is 'manual'.
2323
- schedule_type (str, optional): The type of schedule for the cluster. Accepts 'manual' or 'on-refresh'.
24-
- refresh_rehydration_time_estimate (str, optional): The estimated rehydration time for the cluster. Only applicable when schedule_type is 'on-refresh'.
24+
- refresh_hydration_time_estimate (str, optional): The estimated hydration time for the cluster. Only applicable when schedule_type is 'on-refresh'.
2525
- ignore_existing_objects (bool, optional): Whether to ignore existing objects in the cluster. Defaults to false.
2626
- force_deploy_suffix (bool, optional): Whether to forcefully add a deploy suffix to the cluster name. Defaults to false.
2727

2828
Incompatibilities:
2929
- replication_factor is only applicable when schedule_type is 'manual'.
30-
- refresh_rehydration_time_estimate is only applicable when schedule_type is 'on-refresh'.
30+
- refresh_hydration_time_estimate is only applicable when schedule_type is 'on-refresh'.
3131
#}
3232
{% macro create_cluster(
3333
cluster_name,
3434
size,
3535
replication_factor=none,
3636
schedule_type=none,
37-
refresh_rehydration_time_estimate=none,
37+
refresh_hydration_time_estimate=none,
3838
ignore_existing_objects=false,
3939
force_deploy_suffix=false
4040
) %}
@@ -116,8 +116,8 @@ This macro creates a cluster with the specified properties.
116116
{% if replication_factor is not none and ( schedule_type == 'manual' or schedule_type is none ) %}
117117
, REPLICATION FACTOR = {{ replication_factor }}
118118
{% elif schedule_type == 'on-refresh' %}
119-
{% if refresh_rehydration_time_estimate is not none %}
120-
, SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = {{ dbt.string_literal(refresh_rehydration_time_estimate) }})
119+
{% if refresh_hydration_time_estimate is not none %}
120+
, SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = {{ dbt.string_literal(refresh_hydration_time_estimate) }})
121121
{% else %}
122122
, SCHEDULE = ON REFRESH
123123
{% endif %}

misc/dbt-materialize/dbt/include/materialize/macros/deploy/deploy_init.sql

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@
115115
c.id AS cluster_id,
116116
c.name AS cluster_name,
117117
cs.type AS schedule_type,
118-
cs.refresh_rehydration_time_estimate
118+
cs.refresh_hydration_time_estimate
119119
FROM mz_clusters c
120120
LEFT JOIN mz_internal.mz_cluster_schedules cs ON cs.cluster_id = c.id
121121
WHERE c.name = {{ dbt.string_literal(cluster) }}
@@ -130,7 +130,7 @@
130130
{% set size = results[1] %}
131131
{% set replication_factor = results[2] %}
132132
{% set schedule_type = results[5] %}
133-
{% set refresh_rehydration_time_estimate = results[6] %}
133+
{% set refresh_hydration_time_estimate = results[6] %}
134134

135135
{% if not managed %}
136136
{{ exceptions.raise_compiler_error("Production cluster " ~ cluster ~ " is not managed") }}
@@ -141,7 +141,7 @@
141141
size=size,
142142
replication_factor=replication_factor,
143143
schedule_type=schedule_type,
144-
refresh_rehydration_time_estimate=refresh_rehydration_time_estimate,
144+
refresh_hydration_time_estimate=refresh_hydration_time_estimate,
145145
ignore_existing_objects=ignore_existing_objects,
146146
force_deploy_suffix=True
147147
) %}

misc/dbt-materialize/tests/adapter/test_ci.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ def test_create_cluster_with_on_refresh_schedule(self, project):
212212
"run-operation",
213213
"create_cluster",
214214
"--args",
215-
'{"cluster_name": "test_on_refresh_schedule", "size": "1", "schedule_type": "on-refresh", "refresh_rehydration_time_estimate": "10m", "ignore_existing_objects": true, "force_deploy_suffix": true}',
215+
'{"cluster_name": "test_on_refresh_schedule", "size": "1", "schedule_type": "on-refresh", "refresh_hydration_time_estimate": "10m", "ignore_existing_objects": true, "force_deploy_suffix": true}',
216216
]
217217
)
218218

@@ -324,7 +324,7 @@ def get_cluster_properties(project, cluster_name):
324324
c.id AS cluster_id,
325325
c.name AS cluster_name,
326326
cs.type AS schedule_type,
327-
cs.refresh_rehydration_time_estimate
327+
cs.refresh_hydration_time_estimate
328328
FROM mz_clusters c
329329
LEFT JOIN mz_internal.mz_cluster_schedules cs ON cs.cluster_id = c.id
330330
WHERE c.name = '{cluster_name}'

misc/dbt-materialize/tests/adapter/test_deploy.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -507,9 +507,9 @@ def test_fails_on_unmanaged_cluster(self, project):
507507

508508
run_dbt(["run-operation", "deploy_init"], expect_pass=False)
509509

510-
def test_dbt_deploy_init_with_refresh_rehydration_time(self, project):
510+
def test_dbt_deploy_init_with_refresh_hydration_time(self, project):
511511
project.run_sql(
512-
"CREATE CLUSTER prod (SIZE = '1', SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '1 hour'))"
512+
"CREATE CLUSTER prod (SIZE = '1', SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour'))"
513513
)
514514
project.run_sql("CREATE SCHEMA prod")
515515

0 commit comments

Comments
 (0)