Skip to content

Commit 5af1a3b

Browse files
authored
catalog: Catalog backend migration design (#23652)
This commit adds a design doc for how to migrate user environments from using the stash to persist as a backing store for the catalog. It uses a single tombstone values. Works towards resolving #22392
1 parent 5e672cc commit 5af1a3b

File tree

3 files changed

+164
-0
lines changed

3 files changed

+164
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Catalog Backend Migration
2+
3+
- Associated: https://github.com/MaterializeInc/materialize/issues/20953
4+
- Associated: https://github.com/MaterializeInc/materialize/issues/22392
5+
6+
## The Problem
7+
8+
Currently, we persist all users catalog data in the stash. We would like to move all of that data
9+
to Persist. This will have the following benefits:
10+
11+
- A step towards architecting a shareable catalog that will contribute to use case isolation.
12+
- Reduced operational overhead related to maintaining the stash. Note, the overhead is not
13+
completely removed because the storage controller still uses the stash.
14+
15+
## Success Criteria
16+
17+
All user catalog data is stored in Persist.
18+
19+
## Out of Scope
20+
21+
- Fully shareable differential catalog.
22+
- Use case isolation.
23+
- Zero downtime upgrades.
24+
25+
## Solution Proposal
26+
27+
### Config Collection
28+
29+
The config collection is a subset of catalog data that can never be migrated. As a consequence, this
30+
data can be read before the catalog is fully opened and can be used to bootstrap an environment. The
31+
config collection has keys of type `String` and values of type `u64`. Booleans can be modeled using
32+
this collection by storing a `0` for `false` and `1` for `true`. This proposal will add a key called
33+
`tombstone`, with a boolean value that indicates if a specific catalog backend is retired. A value
34+
of `false` means that the backend is not retired (i.e. is still in use) and a value of `true` means
35+
the backend is retired (i.e. is not still in use).
36+
37+
### Epoch Fencing
38+
39+
The catalog durably stores a fencing token called the epoch. All catalog writers and readers also
40+
store an in-memory epoch. When a new writer or reader connects to the catalog they store the
41+
previous epoch plus one in memory and persist this new value durably. Every time a reader or write
42+
performs a read or write, they compare their in-memory token with the persisted token, and if
43+
there's a difference, the operation fails. As a result, whenever a reader/write increments the epoch
44+
durably, they are effectively fencing out all previous readers/writers.
45+
46+
### Catalog Open
47+
48+
Opening a catalog involves the following two steps:
49+
50+
1. Increment the epoch.
51+
2. Either initialize the persisted catalog state if it's uninitialized, or migrate the data format
52+
if there are any migrations, otherwise do nothing.
53+
54+
### Migrate From Stash to Persist
55+
56+
Below are the steps to migrate from the stash to persist.
57+
58+
1. Open the stash catalog.
59+
2. Open the persist catalog.
60+
3. If the stash tombstone is `true`, then we're done.
61+
4. Read stash catalog snapshot.
62+
5. Replace the contents of the persist catalog with the stash catalog snapshot.
63+
6. Set the stash tombstone to `true`.
64+
65+
### Rollback From Persist to Stash
66+
67+
Below are the steps to rollback from persist to the stash.
68+
69+
1. Open the stash catalog.
70+
2. Open the persist catalog.
71+
3. If the stash tombstone is `false` or doesn't exist, then we're done.
72+
4. Read persist catalog snapshot.
73+
5. Replace the contents of the stash catalog with the persist catalog snapshot.
74+
6. Set the stash tombstone to `false`.
75+
76+
NOTE: Steps (5) and (6) can be done as a single write as a performance optimization.
77+
78+
### States
79+
80+
Below is a table describing all possible states of the stash tombstone and what they
81+
indicate about where the source of truth catalog data is located.
82+
83+
| Stash Tombstone | Source of Truth | Explanation |
84+
|-----------------|-----------------|-----------------------------------------------------------------------------|
85+
| None | Stash | Migration has never fully completed. |
86+
| Some(true) | Persist | Migration has completed successfully or rollback crashed before completing. |
87+
| Some(false) | Stash | Rollback has completed successfully or migration crashed before completing. |
88+
89+
### State Transitions
90+
91+
Below is a state transition diagram that shows all possible states of the stash tombstone and how
92+
the states transition. Each node has the value of the stash tombstone. Each edge indicates what
93+
steps from the algorithm above triggers that specific state transition. Next to each node is either
94+
`Stash` if the stash is the source of truth, or `Persist` if persist is the source of truth.
95+
96+
![state-transitions](./static/catalog_migration_to_persist/catalog_migration_single_tombstone.png)
97+
98+
## Alternatives
99+
100+
Below is a similar algorithm that uses two tombstone values, one in persist and one in the stash.
101+
Some may find that the individual states are slightly easier to reason about, but the state
102+
transitions are harder to reason about.
103+
104+
### Migrate From Stash to Persist
105+
106+
Below are the steps to migrate from the stash to persist.
107+
108+
1. Open the stash catalog.
109+
2. Open the persist catalog.
110+
3. Set the stash tombstone to `true`.
111+
4. If the persist tombstone is `false`, then we're done.
112+
5. Read stash catalog snapshot.
113+
6. Write catalog snapshot to persist.
114+
7. Set the persist tombstone to `false`.
115+
116+
NOTE: Steps (6) and (7) can be done as a single compare and append operation as a performance
117+
optimization.
118+
119+
### Rollback From Persist to Stash
120+
121+
Below are the steps to rollback from persist to the stash.
122+
123+
1. Open the stash catalog.
124+
2. Open the persist catalog.
125+
3. Set the stash tombstone to `false`.
126+
4. If the persist tombstone is `true` or doesn't exist, then we're done.
127+
5. Read persist catalog snapshot.
128+
6. Write catalog snapshot to stash.
129+
7. Set the persist tombstone to `true`.
130+
131+
### States
132+
133+
Below is a table describing all possible states of (stash_tombstone, persist_tombstone) and what
134+
they
135+
indicate about where the source of truth catalog data is located.
136+
137+
| Stash Tombstone | Persist Tombstone | Source of Truth | Explanation |
138+
|-----------------|-------------------|-----------------|---------------------------------------|
139+
| None | _ | Stash | Migration has never been initiated. |
140+
| _ | None | Stash | Migration has never fully completed. |
141+
| Some(true) | Some(false) | Persist | Migration has completed successfully. |
142+
| Some(false) | Some(true) | Stash | Rollback has completed successfully. |
143+
| Some(true) | Some(true) | Stash | Migration crashed before completing. |
144+
| Some(false) | Some(false) | Persist | Rollback crashed before completing. |
145+
146+
### State Transitions
147+
148+
Below is a state transition diagram that shows all possible states of
149+
(stash_tombstone, persist_tombstone) and how the states transition. Each node has the format of
150+
(stash_tombstone, persist_tombstone). Each edge indicates what steps from the algorithm above
151+
triggers that specific state transition. Next to each node is either `Stash` if the stash is the
152+
source of truth, or `Persist` if persist is the source of truth.
153+
154+
![state-transitions](./static/catalog_migration_to_persist/catalog_migration_multi_tombstone.png)
155+
156+
## Open questions
157+
158+
- There are certain classes of optimizations we can make to the startup process depending on the
159+
initial state. For example, if we are migrating from the stash to persist and the stash has never
160+
been initialized, then don't initialize the stash. Another example is that if we're rolling back
161+
from persist to the stash and there's data format migrations, then we should skip migrations in
162+
persist. Should we make these optimizations? My opinion is no because they will complicate the
163+
code for something that will hopefully only be run once during a single release and will probably
164+
not save us that much latency.

0 commit comments

Comments
 (0)