Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[persist] PARTITION BY for Persist-backed collections #30355

Merged
merged 6 commits into from
Nov 11, 2024

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Nov 5, 2024

Add a new PARTITION BY clause on materialized views. (Behind a flag!)

This has no effect besides asserting that the new "structured" ordering we're using for Persist data will match the SQL-level ordering on those columns. See the design doc for more details.

Motivation

Issue: https://github.com/MaterializeInc/database-issues/issues/7188
Draft design: https://www.notion.so/materialize/PARTITION-BY-13413f48d37b8099aaaac45902ffdb75

Tips for reviewer

This is prototype syntax, behind a flag! Please direct thoughts on the supported syntax to the design doc - we're collecting opinions there.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@bkirwi bkirwi force-pushed the partition-by branch 4 times, most recently from 3e38745 to cc87aae Compare November 6, 2024 17:57
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM!

src/repr/src/row/encode.rs Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
src/sql/src/session/vars/definitions.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Show resolved Hide resolved
@bkirwi bkirwi changed the title [persist] PARTITION BY [persist] PARTITION BY for Persist-backed collections Nov 7, 2024
@bkirwi bkirwi marked this pull request as ready for review November 8, 2024 20:06
@bkirwi bkirwi requested review from a team as code owners November 8, 2024 20:06
@bkirwi bkirwi requested a review from jkosh44 November 8, 2024 20:06
Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +87 to +90
/// Returns true iff the ordering of the "raw" and Persist-encoded versions of this columm would match:
/// ie. `sort(encode(column)) == encode(sort(column))`. This encoding has been designed so that this
/// is true for many types.
pub fn preserves_order(scalar_type: &ScalarType) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway to add tests for this to make sure it's correct? Also when adding a new type, how do we know whether to return true or false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout! I've added some lines to our property test that checks the sorted column is also sorted according to the schema when preserves_order is true.

Also when adding a new type, how do we know whether to return true or false?

I think the idea is that whoever is designing a new encoding for a type should make a reasonable effort to ensure this property is true... so they should be in a good position to know whether they managed it or not. (And hopefully now the test will keep them honest!)

@bkirwi bkirwi merged commit f4194e7 into MaterializeInc:main Nov 11, 2024
80 checks passed
@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 11, 2024

Merging to avoid rot - if anyone has late-breaking feedback I'll make sure to fold it in for the next round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants