Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog: Catalog backend migration design #23652

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions doc/developer/design/20231201_catalog_migration_to_persist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Catalog Backend Migration

- Associated: https://github.com/MaterializeInc/materialize/issues/20953
- Associated: https://github.com/MaterializeInc/materialize/issues/22392

## The Problem

Currently, we persist all users catalog data in the stash. We would like to move all of that data
to Persist. This will have the following benefits:

- A step towards architecting a shareable catalog that will contribute to use case isolation.
- Reduced operational overhead related to maintaining the stash. Note, the overhead is not
completely removed because the storage controller still uses the stash.

## Success Criteria

All user catalog data is stored in Persist.

## Out of Scope

- Fully shareable differential catalog.
- Use case isolation.
- Zero downtime upgrades.

## Solution Proposal

### Config Collection

The config collection is a subset of catalog data that can never be migrated. As a consequence, this
data can be read before the catalog is fully opened and can be used to bootstrap an environment. The
config collection has keys of type `String` and values of type `u64`. Booleans can be modeled using
this collection by storing a `0` for `false` and `1` for `true`. This proposal will add a key called
`tombstone`, with a boolean value that indicates if a specific catalog backend is retired. A value
of `false` means that the backend is not retired (i.e. is still in use) and a value of `true` means
the backend is retired (i.e. is not still in use).

### Epoch Fencing

The catalog durably stores a fencing token called the epoch. All catalog writers and readers also
store an in-memory epoch. When a new writer or reader connects to the catalog they store the
previous epoch plus one in memory and persist this new value durably. Every time a reader or write
performs a read or write, they compare their in-memory token with the persisted token, and if
there's a difference, the operation fails. As a result, whenever a reader/write increments the epoch
durably, they are effectively fencing out all previous readers/writers.

### Catalog Open

Opening a catalog involves the following two steps:

1. Increment the epoch.
2. Either initialize the persisted catalog state if it's uninitialized, or migrate the data format
if there are any migrations, otherwise do nothing.

### Migrate From Stash to Persist

Below are the steps to migrate from the stash to persist.

1. Open the stash catalog.
2. Open the persist catalog.
3. If the stash tombstone is `true`, then we're done.
4. Read stash catalog snapshot.
5. Replace the contents of the persist catalog with the stash catalog snapshot.
6. Set the stash tombstone to `true`.

### Rollback From Persist to Stash

Below are the steps to rollback from persist to the stash.

1. Open the stash catalog.
2. Open the persist catalog.
3. If the stash tombstone is `false` or doesn't exist, then we're done.
4. Read persist catalog snapshot.
5. Replace the contents of the stash catalog with the persist catalog snapshot.
6. Set the stash tombstone to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is a crash between steps 5 and 6? If this data was from step 5 above, then the persist tombstone will be empty or false (since the stash to persist migration is done at step 3 if it is true). That seems ok? Was trying to think through if steps 5 and 6 here and maybe above MUST be done as a single transaction, but seems like that does not need to be the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we crash between steps 5 and 6 we should be fine. When writing this I actually tried to add a specific section on that scenario, but couldn't figure out the right words.

The idea is that at all times one of the backing stores is the "source of truth", i.e. contains correct data, and the other backing store can be treated as having complete garbage. It might have stale data, it might have no data, it might even have the correct up to date data, but we just treat it as complete garbage. So after step 5, both the stash and persist have the exact same correct data. However, since we haven't flipped the tombstone flag, we treat the stash data as the source of truth and the persist data as garbage. If we crash here, we have the start the whole process over, as if we had never copied the data into persist, even if we end up making no real changes. The same idea applies if another writer fences us out in-between step 5 and 6, they have to start from the beginning and treat the persist data as garbage and the stash as the source of truth.


NOTE: Steps (5) and (6) can be done as a single write as a performance optimization.

### States

Below is a table describing all possible states of the stash tombstone and what they
indicate about where the source of truth catalog data is located.

| Stash Tombstone | Source of Truth | Explanation |
|-----------------|-----------------|-----------------------------------------------------------------------------|
| None | Stash | Migration has never fully completed. |
| Some(true) | Persist | Migration has completed successfully or rollback crashed before completing. |
| Some(false) | Stash | Rollback has completed successfully or migration crashed before completing. |

### State Transitions

Below is a state transition diagram that shows all possible states of the stash tombstone and how
the states transition. Each node has the value of the stash tombstone. Each edge indicates what
steps from the algorithm above triggers that specific state transition. Next to each node is either
`Stash` if the stash is the source of truth, or `Persist` if persist is the source of truth.

![state-transitions](./static/catalog_migration_to_persist/catalog_migration_single_tombstone.png)

## Alternatives

Below is a similar algorithm that uses two tombstone values, one in persist and one in the stash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer the proposed design over this alternative because I like having a single place to check for which one to use

Some may find that the individual states are slightly easier to reason about, but the state
transitions are harder to reason about.

### Migrate From Stash to Persist

Below are the steps to migrate from the stash to persist.

1. Open the stash catalog.
2. Open the persist catalog.
3. Set the stash tombstone to `true`.
4. If the persist tombstone is `false`, then we're done.
5. Read stash catalog snapshot.
6. Write catalog snapshot to persist.
7. Set the persist tombstone to `false`.

NOTE: Steps (6) and (7) can be done as a single compare and append operation as a performance
optimization.

### Rollback From Persist to Stash

Below are the steps to rollback from persist to the stash.

1. Open the stash catalog.
2. Open the persist catalog.
3. Set the stash tombstone to `false`.
4. If the persist tombstone is `true` or doesn't exist, then we're done.
5. Read persist catalog snapshot.
6. Write catalog snapshot to stash.
7. Set the persist tombstone to `true`.

### States

Below is a table describing all possible states of (stash_tombstone, persist_tombstone) and what
they
indicate about where the source of truth catalog data is located.

| Stash Tombstone | Persist Tombstone | Source of Truth | Explanation |
|-----------------|-------------------|-----------------|---------------------------------------|
| None | _ | Stash | Migration has never been initiated. |
| _ | None | Stash | Migration has never fully completed. |
| Some(true) | Some(false) | Persist | Migration has completed successfully. |
| Some(false) | Some(true) | Stash | Rollback has completed successfully. |
| Some(true) | Some(true) | Stash | Migration crashed before completing. |
| Some(false) | Some(false) | Persist | Rollback crashed before completing. |

### State Transitions

Below is a state transition diagram that shows all possible states of
(stash_tombstone, persist_tombstone) and how the states transition. Each node has the format of
(stash_tombstone, persist_tombstone). Each edge indicates what steps from the algorithm above
triggers that specific state transition. Next to each node is either `Stash` if the stash is the
source of truth, or `Persist` if persist is the source of truth.

![state-transitions](./static/catalog_migration_to_persist/catalog_migration_multi_tombstone.png)

## Open questions

- There are certain classes of optimizations we can make to the startup process depending on the
initial state. For example, if we are migrating from the stash to persist and the stash has never
been initialized, then don't initialize the stash. Another example is that if we're rolling back
from persist to the stash and there's data format migrations, then we should skip migrations in
persist. Should we make these optimizations? My opinion is no because they will complicate the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think no. Reducing complexity and the potential for bugs here is more important than minor startup time optimizations

code for something that will hopefully only be run once during a single release and will probably
not save us that much latency.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.