postgres replication: Eager string decoding by antiguru · Pull Request #32020 · MaterializeInc/materialize

antiguru · 2025-03-26T13:19:26Z

The postgres replication implementation represented rows of data as Vec<Option<Vec<u8>>>, which is accurate but potentially inefficient.

Instead, we switch to Row containing Datum::String or Datum::Null. This causes us to do less work in total, with a slight caveat that it is different work on the replication worker. I'm not sure if cloning Bytes into the nested vector is more work than constructing a Row, but there is a possibility for performance changes. A Row certainly is more compact in memory.

Related: MaterializeInc/database-issues#9125

Fixes: MaterializeInc/database-issues#9123

Signed-off-by: Moritz Hoffmann mh@materialize.com

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

The postgres replication implementation represented rows of data as `Vec<Option<Vec<u8>>>`, which is accurate but potentially inefficient. Instead, we switch to `Row` containing `Datum::String` or `Datum::Null`. This causes us to do less work in total, with a slight caveat that it is _different_ work on the replication worker. I'm not sure if cloning Bytes into the nested vector is more work than constructing a `Row`, but there is a possibility for performance changes. A `Row` certainly is more compact in memory. Related: MaterializeInc/database-issues#9125 Fixes: MaterializeInc/database-issues#9123 Signed-off-by: Moritz Hoffmann <mh@materialize.com>

martykulma

lgtm!

petrosagg

lgmt

antiguru requested a review from a team as a code owner March 26, 2025 13:19

antiguru requested review from martykulma and petrosagg March 26, 2025 13:32

martykulma approved these changes Mar 27, 2025

View reviewed changes

petrosagg approved these changes Mar 27, 2025

View reviewed changes

antiguru merged commit 93cebb4 into MaterializeInc:main Mar 27, 2025

antiguru deleted the postgres_replication_row branch March 27, 2025 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

postgres replication: Eager string decoding#32020

postgres replication: Eager string decoding#32020
antiguru merged 1 commit into
MaterializeInc:mainfrom
antiguru:postgres_replication_row

antiguru commented Mar 26, 2025

Uh oh!

martykulma left a comment

Uh oh!

petrosagg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

antiguru commented Mar 26, 2025

Checklist

Uh oh!

martykulma left a comment

Choose a reason for hiding this comment

Uh oh!

petrosagg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants