-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog: Catalog backend migration design #23652
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
# Catalog Backend Migration | ||
|
||
- Associated: https://github.com/MaterializeInc/materialize/issues/20953 | ||
- Associated: https://github.com/MaterializeInc/materialize/issues/22392 | ||
|
||
## The Problem | ||
|
||
Currently, we persist all users catalog data in the stash. We would like to move all of that data | ||
to Persist. This will have the following benefits: | ||
|
||
- A step towards architecting a shareable catalog that will contribute to use case isolation. | ||
- Reduced operational overhead related to maintaining the stash. Note, the overhead is not | ||
completely removed because the storage controller still uses the stash. | ||
|
||
## Success Criteria | ||
|
||
All user catalog data is stored in Persist. | ||
|
||
## Out of Scope | ||
|
||
- Fully shareable differential catalog. | ||
- Use case isolation. | ||
- Zero downtime upgrades. | ||
|
||
## Solution Proposal | ||
|
||
### Config Collection | ||
|
||
The config collection is a subset of catalog data that can never be migrated. As a consequence, this | ||
data can be read before the catalog is fully opened and can be used to bootstrap an environment. The | ||
config collection has keys of type `String` and values of type `u64`. Booleans can be modeled using | ||
this collection by storing a `0` for `false` and `1` for `true`. This proposal will add a key called | ||
`tombstone`, with a boolean value that indicates if a specific catalog backend is retired. A value | ||
of `false` means that the backend is not retired (i.e. is still in use) and a value of `true` means | ||
the backend is retired (i.e. is not still in use). | ||
|
||
### Epoch Fencing | ||
|
||
The catalog durably stores a fencing token called the epoch. All catalog writers and readers also | ||
store an in-memory epoch. When a new writer or reader connects to the catalog they store the | ||
previous epoch plus one in memory and persist this new value durably. Every time a reader or write | ||
performs a read or write, they compare their in-memory token with the persisted token, and if | ||
there's a difference, the operation fails. As a result, whenever a reader/write increments the epoch | ||
durably, they are effectively fencing out all previous readers/writers. | ||
|
||
### Catalog Open | ||
|
||
Opening a catalog involves the following two steps: | ||
|
||
1. Increment the epoch. | ||
2. Either initialize the persisted catalog state if it's uninitialized, or migrate the data format | ||
if there are any migrations, otherwise do nothing. | ||
|
||
### Migrate From Stash to Persist | ||
|
||
Below are the steps to migrate from the stash to persist. | ||
|
||
1. Open the stash catalog. | ||
2. Open the persist catalog. | ||
3. If the stash tombstone is `true`, then we're done. | ||
4. Read stash catalog snapshot. | ||
5. Replace the contents of the persist catalog with the stash catalog snapshot. | ||
6. Set the stash tombstone to `true`. | ||
|
||
### Rollback From Persist to Stash | ||
|
||
Below are the steps to rollback from persist to the stash. | ||
|
||
1. Open the stash catalog. | ||
2. Open the persist catalog. | ||
3. If the stash tombstone is `false` or doesn't exist, then we're done. | ||
4. Read persist catalog snapshot. | ||
5. Replace the contents of the stash catalog with the persist catalog snapshot. | ||
6. Set the stash tombstone to `false`. | ||
|
||
NOTE: Steps (5) and (6) can be done as a single write as a performance optimization. | ||
|
||
### States | ||
|
||
Below is a table describing all possible states of the stash tombstone and what they | ||
indicate about where the source of truth catalog data is located. | ||
|
||
| Stash Tombstone | Source of Truth | Explanation | | ||
|-----------------|-----------------|-----------------------------------------------------------------------------| | ||
| None | Stash | Migration has never fully completed. | | ||
| Some(true) | Persist | Migration has completed successfully or rollback crashed before completing. | | ||
| Some(false) | Stash | Rollback has completed successfully or migration crashed before completing. | | ||
|
||
### State Transitions | ||
|
||
Below is a state transition diagram that shows all possible states of the stash tombstone and how | ||
the states transition. Each node has the value of the stash tombstone. Each edge indicates what | ||
steps from the algorithm above triggers that specific state transition. Next to each node is either | ||
`Stash` if the stash is the source of truth, or `Persist` if persist is the source of truth. | ||
|
||
 | ||
|
||
## Alternatives | ||
|
||
Below is a similar algorithm that uses two tombstone values, one in persist and one in the stash. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also prefer the proposed design over this alternative because I like having a single place to check for which one to use |
||
Some may find that the individual states are slightly easier to reason about, but the state | ||
transitions are harder to reason about. | ||
|
||
### Migrate From Stash to Persist | ||
|
||
Below are the steps to migrate from the stash to persist. | ||
|
||
1. Open the stash catalog. | ||
2. Open the persist catalog. | ||
3. Set the stash tombstone to `true`. | ||
4. If the persist tombstone is `false`, then we're done. | ||
5. Read stash catalog snapshot. | ||
6. Write catalog snapshot to persist. | ||
7. Set the persist tombstone to `false`. | ||
|
||
NOTE: Steps (6) and (7) can be done as a single compare and append operation as a performance | ||
optimization. | ||
|
||
### Rollback From Persist to Stash | ||
|
||
Below are the steps to rollback from persist to the stash. | ||
|
||
1. Open the stash catalog. | ||
2. Open the persist catalog. | ||
3. Set the stash tombstone to `false`. | ||
4. If the persist tombstone is `true` or doesn't exist, then we're done. | ||
5. Read persist catalog snapshot. | ||
6. Write catalog snapshot to stash. | ||
7. Set the persist tombstone to `true`. | ||
|
||
### States | ||
|
||
Below is a table describing all possible states of (stash_tombstone, persist_tombstone) and what | ||
they | ||
indicate about where the source of truth catalog data is located. | ||
|
||
| Stash Tombstone | Persist Tombstone | Source of Truth | Explanation | | ||
|-----------------|-------------------|-----------------|---------------------------------------| | ||
| None | _ | Stash | Migration has never been initiated. | | ||
| _ | None | Stash | Migration has never fully completed. | | ||
| Some(true) | Some(false) | Persist | Migration has completed successfully. | | ||
| Some(false) | Some(true) | Stash | Rollback has completed successfully. | | ||
| Some(true) | Some(true) | Stash | Migration crashed before completing. | | ||
| Some(false) | Some(false) | Persist | Rollback crashed before completing. | | ||
|
||
### State Transitions | ||
|
||
Below is a state transition diagram that shows all possible states of | ||
(stash_tombstone, persist_tombstone) and how the states transition. Each node has the format of | ||
(stash_tombstone, persist_tombstone). Each edge indicates what steps from the algorithm above | ||
triggers that specific state transition. Next to each node is either `Stash` if the stash is the | ||
source of truth, or `Persist` if persist is the source of truth. | ||
|
||
 | ||
|
||
## Open questions | ||
|
||
- There are certain classes of optimizations we can make to the startup process depending on the | ||
initial state. For example, if we are migrating from the stash to persist and the stash has never | ||
been initialized, then don't initialize the stash. Another example is that if we're rolling back | ||
from persist to the stash and there's data format migrations, then we should skip migrations in | ||
persist. Should we make these optimizations? My opinion is no because they will complicate the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also think no. Reducing complexity and the potential for bugs here is more important than minor startup time optimizations |
||
code for something that will hopefully only be run once during a single release and will probably | ||
not save us that much latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a crash between steps 5 and 6? If this data was from step 5 above, then the persist tombstone will be empty or false (since the stash to persist migration is done at step 3 if it is true). That seems ok? Was trying to think through if steps 5 and 6 here and maybe above MUST be done as a single transaction, but seems like that does not need to be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we crash between steps 5 and 6 we should be fine. When writing this I actually tried to add a specific section on that scenario, but couldn't figure out the right words.
The idea is that at all times one of the backing stores is the "source of truth", i.e. contains correct data, and the other backing store can be treated as having complete garbage. It might have stale data, it might have no data, it might even have the correct up to date data, but we just treat it as complete garbage. So after step 5, both the stash and persist have the exact same correct data. However, since we haven't flipped the tombstone flag, we treat the stash data as the source of truth and the persist data as garbage. If we crash here, we have the start the whole process over, as if we had never copied the data into persist, even if we end up making no real changes. The same idea applies if another writer fences us out in-between step 5 and 6, they have to start from the beginning and treat the persist data as garbage and the stash as the source of truth.