Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog: Skip item optimization for renames #31071

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Jan 16, 2025

Internally, the catalog models an update by a retraction followed by an
insert. Items are represented internally by their CREATE SQL string. So
when processing an item update, we first remove the existing item,
then parse, plan, and optimize the new SQL string, and then finally
insert the new item.

Parsing, planning, and optimization can be expensive, so we first check
if the CREATE SQL of the addition of an item matches the CREATE SQL of
the retraction of an item. If so, we can skip parsing, planning, and
optimization and just copy the old item. This can happen, for example,
when updating the privileges of an item, because the privileges are not
stored in the item AST.

Sometimes, when objects are renamed, the CREATE SQL string of an item
may change, but it ends up producing the same HIR and MIR. When this
happens we will spend unnecessary time re-optimizing the HIR, only to
produce an equivalent MIR when compared to the previous item. This
scenario is known to happen for ALTER SCHEMA ... SWAP ....

This commit removes the unnecessary optimization work for equivalent
HIRs. After parsing and planning the CREATE SQL string of an item
addition, we check if the resulting HIR matches the HIR of the previous
item. If so, we skip optimization and re-use the previous item's MIR.
Importantly, the skipped optimization step is a local optimization, so
equal HIRs are guaranteed to produce equal MIRs.

Motivation

This PR adds a known-desirable feature.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@jkosh44 jkosh44 requested a review from ggevay January 16, 2025 17:06
@jkosh44 jkosh44 force-pushed the rename-no-opts branch 2 times, most recently from a79aa4d to b062e20 Compare January 16, 2025 17:28
@jkosh44 jkosh44 changed the title WIP: Optimize item renames catalog: Skip item optimization for renames Jan 16, 2025
Internally, the catalog models an update by a retraction followed by an
insert. Items are represented internally by their CREATE SQL string. So
when processing an item update, we first remove the existing item,
then parse, plan, and optimize the new SQL string, and then finally
insert the new item.

Parsing, planning, and optimization can be expensive, so we first check
if the CREATE SQL of the addition of an item matches the CREATE SQL of
the retraction of an item. If so, we can skip parsing, planning, and
optimization and just copy the old item. This can happen, for example,
when updating the privileges of an item, because the privileges are not
stored in the item AST.

Sometimes, when objects are renamed, the CREATE SQL string of an item
may change, but it ends up producing the same HIR and MIR. When this
happens we will spend unnecessary time re-optimizing the HIR, only to
produce an equivalent MIR when compared to the previous item. This
scenario is known to happen for `ALTER SCHEMA ... SWAP ...`.

This commit removes the unnecessary optimization work for equivalent
HIRs. After parsing and planning the CREATE SQL string of an item
addition, we check if the resulting HIR matches the HIR of the previous
item. If so, we skip optimization and re-use the previous item's MIR.
Importantly, the skipped optimization step is a local optimization, so
equal HIRs are guaranteed to produce equal MIRs.
@jkosh44 jkosh44 marked this pull request as ready for review January 16, 2025 18:06
@jkosh44 jkosh44 requested a review from a team as a code owner January 16, 2025 18:06
@jkosh44 jkosh44 requested review from aljoscha and removed request for aljoscha January 16, 2025 18:06
Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I checked with my repro for the original issue, and the optimizer is now indeed not called on either alter cluster swap or alter schema swap.

Let's wait for Nightly, and then merge it.

@jkosh44 jkosh44 merged commit 619fdb9 into MaterializeInc:main Jan 16, 2025
229 of 231 checks passed
@jkosh44 jkosh44 deleted the rename-no-opts branch January 16, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants