catalog: Skip item optimization for renames #31071

jkosh44 · 2025-01-16T17:06:00Z

Internally, the catalog models an update by a retraction followed by an
insert. Items are represented internally by their CREATE SQL string. So
when processing an item update, we first remove the existing item,
then parse, plan, and optimize the new SQL string, and then finally
insert the new item.

Parsing, planning, and optimization can be expensive, so we first check
if the CREATE SQL of the addition of an item matches the CREATE SQL of
the retraction of an item. If so, we can skip parsing, planning, and
optimization and just copy the old item. This can happen, for example,
when updating the privileges of an item, because the privileges are not
stored in the item AST.

Sometimes, when objects are renamed, the CREATE SQL string of an item
may change, but it ends up producing the same HIR and MIR. When this
happens we will spend unnecessary time re-optimizing the HIR, only to
produce an equivalent MIR when compared to the previous item. This
scenario is known to happen for ALTER SCHEMA ... SWAP ....

This commit removes the unnecessary optimization work for equivalent
HIRs. After parsing and planning the CREATE SQL string of an item
addition, we check if the resulting HIR matches the HIR of the previous
item. If so, we skip optimization and re-use the previous item's MIR.
Importantly, the skipped optimization step is a local optimization, so
equal HIRs are guaranteed to produce equal MIRs.

Motivation

This PR adds a known-desirable feature.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Internally, the catalog models an update by a retraction followed by an insert. Items are represented internally by their CREATE SQL string. So when processing an item update, we first remove the existing item, then parse, plan, and optimize the new SQL string, and then finally insert the new item. Parsing, planning, and optimization can be expensive, so we first check if the CREATE SQL of the addition of an item matches the CREATE SQL of the retraction of an item. If so, we can skip parsing, planning, and optimization and just copy the old item. This can happen, for example, when updating the privileges of an item, because the privileges are not stored in the item AST. Sometimes, when objects are renamed, the CREATE SQL string of an item may change, but it ends up producing the same HIR and MIR. When this happens we will spend unnecessary time re-optimizing the HIR, only to produce an equivalent MIR when compared to the previous item. This scenario is known to happen for `ALTER SCHEMA ... SWAP ...`. This commit removes the unnecessary optimization work for equivalent HIRs. After parsing and planning the CREATE SQL string of an item addition, we check if the resulting HIR matches the HIR of the previous item. If so, we skip optimization and re-use the previous item's MIR. Importantly, the skipped optimization step is a local optimization, so equal HIRs are guaranteed to produce equal MIRs.

ggevay

Thank you!

I checked with my repro for the original issue, and the optimizer is now indeed not called on either alter cluster swap or alter schema swap.

Let's wait for Nightly, and then merge it.

jkosh44 requested a review from ggevay January 16, 2025 17:06

ParkMyCar approved these changes Jan 16, 2025

View reviewed changes

jkosh44 force-pushed the rename-no-opts branch 2 times, most recently from a79aa4d to b062e20 Compare January 16, 2025 17:28

jkosh44 changed the title ~~WIP: Optimize item renames~~ catalog: Skip item optimization for renames Jan 16, 2025

jkosh44 force-pushed the rename-no-opts branch from b062e20 to a0bb372 Compare January 16, 2025 17:42

jkosh44 force-pushed the rename-no-opts branch from a0bb372 to f008825 Compare January 16, 2025 17:56

jkosh44 marked this pull request as ready for review January 16, 2025 18:06

jkosh44 requested a review from a team as a code owner January 16, 2025 18:06

jkosh44 requested review from aljoscha and removed request for aljoscha January 16, 2025 18:06

ggevay approved these changes Jan 16, 2025

View reviewed changes

jkosh44 merged commit 619fdb9 into MaterializeInc:main Jan 16, 2025
229 of 231 checks passed

jkosh44 deleted the rename-no-opts branch January 16, 2025 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catalog: Skip item optimization for renames #31071

catalog: Skip item optimization for renames #31071

jkosh44 commented Jan 16, 2025 •

edited

Loading

ggevay left a comment

catalog: Skip item optimization for renames #31071

catalog: Skip item optimization for renames #31071

Conversation

jkosh44 commented Jan 16, 2025 • edited Loading

Motivation

Checklist

ggevay left a comment

Choose a reason for hiding this comment

jkosh44 commented Jan 16, 2025 •

edited

Loading