Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist: Plumb relevant arrow::DataTypes to compaction #30627

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Nov 25, 2024

This PR plumbs arrow::datatypes::DataType from Persist's schema registry down into compaction.

Previously we changed compaction to use the most recent schema of any since run (#30382). This would pass the relevant K::Schema and V::Schemas down into compaction, but then map to an arrow::DataType using the current code.

The problem we're running into when trying to stabilize schema evolution in Persist (#30205) is that during 0dt upgrades the new version of Materialize ends up getting some compaction work assigned to it, and writes a batch of data using more nullable types. The old version of Materialize sees the new schema, but maps the arrow::DataType to less nullable types because we haven't yet changed how RelationDesc maps to an arrow::DataType.

After this PR the old version of Materialize should use the more up-to-date version of the arrow::DataType mapping and succeed.

Note: If this change is undesirable, we can remove it after a single version upgrade.

Motivation

Fixes an issue discovered in: #30205

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar requested a review from bkirwi November 25, 2024 22:33
@ParkMyCar ParkMyCar changed the title [WIP] persist: Plumb relevant arrow::DataTypes to compaction persist: Plumb relevant arrow::DataTypes to compaction Nov 25, 2024
@ParkMyCar ParkMyCar marked this pull request as ready for review November 25, 2024 22:34
@ParkMyCar ParkMyCar requested review from aljoscha, danhhz, jkosh44 and a team as code owners November 25, 2024 22:34
Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think through what happens when a recent version ends up calling get_or_make_structured on an old batch without structured data, using the old schema with the old datatype... IIUC it will use the current encoder logic to generate a nullable version of that data, then try and datatype-migrate it to the old datatype and fail?

@jkosh44 jkosh44 removed their request for review November 26, 2024 17:33
@ParkMyCar
Copy link
Member Author

No longer needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants