Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres replication: Eager string decoding #32020

Merged
merged 1 commit into from
Mar 27, 2025

Conversation

antiguru
Copy link
Member

The postgres replication implementation represented rows of data as Vec<Option<Vec<u8>>>, which is accurate but potentially inefficient.

Instead, we switch to Row containing Datum::String or Datum::Null. This causes us to do less work in total, with a slight caveat that it is different work on the replication worker. I'm not sure if cloning Bytes into the nested vector is more work than constructing a Row, but there is a possibility for performance changes. A Row certainly is more compact in memory.

Related: MaterializeInc/database-issues#9125

Fixes: MaterializeInc/database-issues#9123

Signed-off-by: Moritz Hoffmann [email protected]

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

The postgres replication implementation represented rows of data as
`Vec<Option<Vec<u8>>>`, which is accurate but potentially inefficient.

Instead, we switch to `Row` containing `Datum::String` or `Datum::Null`.
This causes us to do less work in total, with a slight caveat that it is
_different_ work on the replication worker. I'm not sure if cloning Bytes
into the nested vector is more work than constructing a `Row`, but there is
a possibility for performance changes. A `Row` certainly is more compact in
memory.

Related: MaterializeInc/database-issues#9125

Fixes: MaterializeInc/database-issues#9123

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru requested a review from a team as a code owner March 26, 2025 13:19
Copy link
Contributor

@martykulma martykulma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgmt

@antiguru antiguru merged commit 93cebb4 into MaterializeInc:main Mar 27, 2025
82 checks passed
@antiguru antiguru deleted the postgres_replication_row branch March 27, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants