Skip to content

Commit eba9e09

Browse files
committed
another round of feedback
1 parent d68c22c commit eba9e09

File tree

1 file changed

+17
-12
lines changed

1 file changed

+17
-12
lines changed

doc/developer/design/20320328_persist_columnar.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,11 @@ column named `b`.
5151

5252
* Batches within Persist are self-describing.
5353
* Unblock work for the following projects:
54-
* Evolving the schema of a Persist shard (e.g. adding columns to tables)
55-
* User defined sort order of data, i.e. `PARTITION BY`
54+
* Evolving the schema of a Persist shard (e.g. adding columns to tables).
55+
* User defined sort order of data, i.e. `PARTITION BY`.
5656
* Only fetch the columns from a shard that are needed, i.e. projection
57-
pushdown
57+
pushdown.
58+
* Make `persist_source` faster [#25901](https://github.com/MaterializeInc/materialize/issues/25901).
5859

5960
## Out of Scope
6061

@@ -202,17 +203,18 @@ struct NaiveTime {
202203
</td>
203204
<td>
204205

205-
`PrimitiveArray<u64>`
206+
`FixedSizeBinary[8]`
206207

207208
</td>
208209
<td>
209210

210-
Represented as the number of nanoseconds since midnight.
211+
Represented as the `secs` field and `frac` field encoded in that order as
212+
big-endian.
211213

212-
> **Alternative:** Instead of representing this as the number of nanoseconds
213-
since midnight we could stich together the two `u32`s from `NaiveTime`. This
214-
would maybe save some CPU cycles during encoding and decoding, but probably not
215-
enough to matter, and it locks us into a somewhat less flexible format.
214+
> **Alternative:** We could represent this as number of nanoseconds since
215+
midnight which is a bit more general but is a more costly at runtime for
216+
encoding. Ideally Persist encoding is a fast as possible so I'm leaning towards
217+
the more direct-from-Rust approach.
216218

217219
> Note: We only need 47 bits to represent this total range, leaving 19 bits
218220
unused. In the future if we support the `TIMETZ` type we could probably also
@@ -796,7 +798,9 @@ time.
796798
Parts.
797799

798800
Neither option is great, but I am leanning towards [1] since it puts the
799-
pressure/complexity on S3 and the network more than Persist internals.
801+
pressure/complexity on S3 and the network more than Persist internals. To make
802+
sure this solution is stable we can add Prometheus metrics tracking the size of
803+
blobs we read from S3 and monitor them during rollout to staging and canaries.
800804

801805
### Consolidation and Compaction
802806

@@ -861,8 +865,9 @@ An approximate ordering of work would be:
861865
`ProtoDatum` representation and collects no stats. This unblocks writing
862866
structured columnar data, and asynchronously we can nail down the right
863867
Arrow encoding formats.
864-
2. Begin writing structured columnar data to S3, in the migration format.
865-
Allows us to begin a "shadow migration" and validate the new format.
868+
2. Begin writing structured columnar data to S3 for staging and prod canaries,
869+
in the migration format. Allows us to begin a "shadow migration" and
870+
validate the new format.
866871
3. Serve reads from the new structured columnar data. For `Datum`s that have
867872
had non-V0 Arrow encodings implemented, this allows us to avoid decoding
868873
`ProtoDatum` in the read path, which currently is relatively expensive.

0 commit comments

Comments
 (0)