|
| 1 | +# Catalog Backend Migration |
| 2 | + |
| 3 | +- Associated: https://github.com/MaterializeInc/materialize/issues/20953 |
| 4 | +- Associated: https://github.com/MaterializeInc/materialize/issues/22392 |
| 5 | + |
| 6 | +## The Problem |
| 7 | + |
| 8 | +Currently, we persist all users catalog data in the stash. We would like to move all of that data |
| 9 | +to Persist. This will have the following benefits: |
| 10 | + |
| 11 | +- A step towards architecting a shareable catalog that will contribute to use case isolation. |
| 12 | +- Reduced operational overhead related to maintaining the stash. Note, the overhead is not |
| 13 | + completely removed because the storage controller still uses the stash. |
| 14 | + |
| 15 | +## Success Criteria |
| 16 | + |
| 17 | +All user catalog data is stored in Persist. |
| 18 | + |
| 19 | +## Out of Scope |
| 20 | + |
| 21 | +- Fully shareable differential catalog. |
| 22 | +- Use case isolation. |
| 23 | +- Zero downtime upgrades. |
| 24 | + |
| 25 | +## Solution Proposal |
| 26 | + |
| 27 | +### Config Collection |
| 28 | + |
| 29 | +The config collection is a subset of catalog data that can never be migrated. As a consequence, this |
| 30 | +data can be read before the catalog is fully opened and can be used to bootstrap an environment. The |
| 31 | +config collection has keys of type `String` and values of type `u64`. Booleans can be modeled using |
| 32 | +this collection by storing a `0` for `false` and `1` for `true`. This proposal will add a key called |
| 33 | +`tombstone`, with a boolean value that indicates if a specific catalog backend is retired. A value |
| 34 | +of `false` means that the backend is not retired (i.e. is still in use) and a value of `true` means |
| 35 | +the backend is retired (i.e. is not still in use). |
| 36 | + |
| 37 | +### Epoch Fencing |
| 38 | + |
| 39 | +The catalog durably stores a fencing token called the epoch. All catalog writers and readers also |
| 40 | +store an in-memory epoch. When a new writer or reader connects to the catalog they store the |
| 41 | +previous epoch plus one in memory and persist this new value durably. Every time a reader or write |
| 42 | +performs a read or write, they compare their in-memory token with the persisted token, and if |
| 43 | +there's a difference, the operation fails. As a result, whenever a reader/write increments the epoch |
| 44 | +durably, they are effectively fencing out all previous readers/writers. |
| 45 | + |
| 46 | +### Catalog Open |
| 47 | + |
| 48 | +Opening a catalog involves the following two steps: |
| 49 | + |
| 50 | +1. Increment the epoch. |
| 51 | +2. Either initialize the persisted catalog state if it's uninitialized, or migrate the data format |
| 52 | + if there are any migrations, otherwise do nothing. |
| 53 | + |
| 54 | +### Migrate From Stash to Persist |
| 55 | + |
| 56 | +Below are the steps to migrate from the stash to persist. |
| 57 | + |
| 58 | +1. Open the stash catalog. |
| 59 | +2. Open the persist catalog. |
| 60 | +3. If the stash tombstone is `true`, then we're done. |
| 61 | +4. Read stash catalog snapshot. |
| 62 | +5. Replace the contents of the persist catalog with the stash catalog snapshot. |
| 63 | +6. Set the stash tombstone to `true`. |
| 64 | + |
| 65 | +### Rollback From Persist to Stash |
| 66 | + |
| 67 | +Below are the steps to rollback from persist to the stash. |
| 68 | + |
| 69 | +1. Open the stash catalog. |
| 70 | +2. Open the persist catalog. |
| 71 | +3. If the stash tombstone is `false` or doesn't exist, then we're done. |
| 72 | +4. Read persist catalog snapshot. |
| 73 | +5. Replace the contents of the stash catalog with the persist catalog snapshot. |
| 74 | +6. Set the stash tombstone to `false`. |
| 75 | + |
| 76 | +NOTE: Steps (5) and (6) can be done as a single write as a performance optimization. |
| 77 | + |
| 78 | +### States |
| 79 | + |
| 80 | +Below is a table describing all possible states of the stash tombstone and what they |
| 81 | +indicate about where the source of truth catalog data is located. |
| 82 | + |
| 83 | +| Stash Tombstone | Source of Truth | Explanation | |
| 84 | +|-----------------|-----------------|-----------------------------------------------------------------------------| |
| 85 | +| None | Stash | Migration has never fully completed. | |
| 86 | +| Some(true) | Persist | Migration has completed successfully or rollback crashed before completing. | |
| 87 | +| Some(false) | Stash | Rollback has completed successfully or migration crashed before completing. | |
| 88 | + |
| 89 | +### State Transitions |
| 90 | + |
| 91 | +Below is a state transition diagram that shows all possible states of the stash tombstone and how |
| 92 | +the states transition. Each node has the value of the stash tombstone. Each edge indicates what |
| 93 | +steps from the algorithm above triggers that specific state transition. Next to each node is either |
| 94 | +`Stash` if the stash is the source of truth, or `Persist` if persist is the source of truth. |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +## Alternatives |
| 99 | + |
| 100 | +Below is a similar algorithm that uses two tombstone values, one in persist and one in the stash. |
| 101 | +Some may find that the individual states are slightly easier to reason about, but the state |
| 102 | +transitions are harder to reason about. |
| 103 | + |
| 104 | +### Migrate From Stash to Persist |
| 105 | + |
| 106 | +Below are the steps to migrate from the stash to persist. |
| 107 | + |
| 108 | +1. Open the stash catalog. |
| 109 | +2. Open the persist catalog. |
| 110 | +3. Set the stash tombstone to `true`. |
| 111 | +4. If the persist tombstone is `false`, then we're done. |
| 112 | +5. Read stash catalog snapshot. |
| 113 | +6. Write catalog snapshot to persist. |
| 114 | +7. Set the persist tombstone to `false`. |
| 115 | + |
| 116 | +NOTE: Steps (6) and (7) can be done as a single compare and append operation as a performance |
| 117 | +optimization. |
| 118 | + |
| 119 | +### Rollback From Persist to Stash |
| 120 | + |
| 121 | +Below are the steps to rollback from persist to the stash. |
| 122 | + |
| 123 | +1. Open the stash catalog. |
| 124 | +2. Open the persist catalog. |
| 125 | +3. Set the stash tombstone to `false`. |
| 126 | +4. If the persist tombstone is `true` or doesn't exist, then we're done. |
| 127 | +5. Read persist catalog snapshot. |
| 128 | +6. Write catalog snapshot to stash. |
| 129 | +7. Set the persist tombstone to `true`. |
| 130 | + |
| 131 | +### States |
| 132 | + |
| 133 | +Below is a table describing all possible states of (stash_tombstone, persist_tombstone) and what |
| 134 | +they |
| 135 | +indicate about where the source of truth catalog data is located. |
| 136 | + |
| 137 | +| Stash Tombstone | Persist Tombstone | Source of Truth | Explanation | |
| 138 | +|-----------------|-------------------|-----------------|---------------------------------------| |
| 139 | +| None | _ | Stash | Migration has never been initiated. | |
| 140 | +| _ | None | Stash | Migration has never fully completed. | |
| 141 | +| Some(true) | Some(false) | Persist | Migration has completed successfully. | |
| 142 | +| Some(false) | Some(true) | Stash | Rollback has completed successfully. | |
| 143 | +| Some(true) | Some(true) | Stash | Migration crashed before completing. | |
| 144 | +| Some(false) | Some(false) | Persist | Rollback crashed before completing. | |
| 145 | + |
| 146 | +### State Transitions |
| 147 | + |
| 148 | +Below is a state transition diagram that shows all possible states of |
| 149 | +(stash_tombstone, persist_tombstone) and how the states transition. Each node has the format of |
| 150 | +(stash_tombstone, persist_tombstone). Each edge indicates what steps from the algorithm above |
| 151 | +triggers that specific state transition. Next to each node is either `Stash` if the stash is the |
| 152 | +source of truth, or `Persist` if persist is the source of truth. |
| 153 | + |
| 154 | + |
| 155 | + |
| 156 | +## Open questions |
| 157 | + |
| 158 | +- There are certain classes of optimizations we can make to the startup process depending on the |
| 159 | + initial state. For example, if we are migrating from the stash to persist and the stash has never |
| 160 | + been initialized, then don't initialize the stash. Another example is that if we're rolling back |
| 161 | + from persist to the stash and there's data format migrations, then we should skip migrations in |
| 162 | + persist. Should we make these optimizations? My opinion is no because they will complicate the |
| 163 | + code for something that will hopefully only be run once during a single release and will probably |
| 164 | + not save us that much latency. |
0 commit comments