Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a special optimizer pipeline for constant INSERTs #30666

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Dec 1, 2024

This PR fixes this regression in INSERT performance: https://github.com/MaterializeInc/database-issues/issues/8801
It adds a new, very simple optimizer pipeline, whose job is just to take care of constant INSERTs.

The test that regressed in the issue by 10-20% compared to v0.125.3 is now 30-40% faster compared to v0.125.3:
https://buildkite.com/materialize/nightly/builds/10584

Motivation

Tips for reviewer

The first commit is just some trivial cleanup, can be reviewed separately.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay marked this pull request as ready for review December 2, 2024 11:37
@ggevay ggevay requested review from a team as code owners December 2, 2024 11:37
@ggevay ggevay requested a review from jkosh44 December 2, 2024 11:37
@ggevay ggevay added A-optimization Area: query optimization and transformation A-ADAPTER Topics related to the ADAPTER layer A-CLUSTER Topics related to the CLUSTER layer labels Dec 2, 2024
@def-
Copy link
Contributor

def- commented Dec 2, 2024

The test that regressed in the issue by 10-20% compared to v0.125.3 is now 30-40% faster compared to v0.125.3:

Wonderful, thank you for checking. I'll verify locally with a larger scale too.

@def-
Copy link
Contributor

def- commented Dec 2, 2024

Fully nightly run triggered, good for me if green since this should be used across a lot of existing tests: https://buildkite.com/materialize/nightly/builds/10588

@ggevay
Copy link
Contributor Author

ggevay commented Dec 2, 2024

Well, I've also run the test locally, and it showed a 15-16% regression. What could be the reason for it behaving differently locally? Also, I'm curious to see what your local runs show.

I did

bin/mzcompose --find feature-benchmark down && bin/mzcompose --find feature-benchmark run default --scenario InsertMultiRow --other-tag v0.125.3 --scale=5

as recommended in the issue.

@def-
Copy link
Contributor

def- commented Dec 2, 2024

The main difference is that in Nightly this scenario runs with scale 4, while this is running it with scale 5 (10 times larger).

Similar for me:

+++ Benchmark Report for run 1:
NAME                                | TYPE            |      THIS       |      OTHER      |  UNIT  | THRESHOLD  |  Regression?  | 'THIS' is
--------------------------------------------------------------------------------------------------------------------------------------------------------
InsertMultiRow                      | wallclock       |           1.158 |           0.976 |   s    |    10%     |    !!YES!!    | worse:  18.6% slower
InsertMultiRow                      | memory_mz       |         620.461 |         857.162 |   MB   |    20%     |      no       | better: 27.6% less
InsertMultiRow                      | memory_clusterd |         132.656 |         213.432 |   MB   |    50%     |      no       | better: 37.8% less
+++ Benchmark Report for run 2:
NAME                                | TYPE            |      THIS       |      OTHER      |  UNIT  | THRESHOLD  |  Regression?  | 'THIS' is
--------------------------------------------------------------------------------------------------------------------------------------------------------
InsertMultiRow                      | wallclock       |           1.192 |           1.056 |   s    |    10%     |    !!YES!!    | worse:  12.9% slower
InsertMultiRow                      | memory_mz       |         655.174 |         660.229 |   MB   |    20%     |      no       | better:  0.8% less
InsertMultiRow                      | memory_clusterd |         113.010 |         111.771 |   MB   |    50%     |      no       | worse:   1.1% more

So this fixes the performance for smaller inserts (1000 rows) but not with many rows (10000 rows), interesting.

Copy link
Contributor

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems fine, but the structure, naming, and comments have a strong binding to INSERT statements, but apply the logic beyond insert statements. It seems to generally treat constant expressions better, independent of whether they are inserts or other statements. I think we should make sure the code reflects that, vs treating it as a special case.

Comment on lines 10 to 11
//! Optimizer implementation for `CREATE VIEW` statements and other misc statements, such as
//! `INSERT`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the additional context helps much. Sure we can just say "for relational expressions" or something? We're still in a file called view.rs, so .. there's more to do if we really want to clean things up, but I think "and other misc statements" could be tightened up or removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the same to me at the moment. Not the most important detail, but also perhaps something wasn't pushed / refreshed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry. Changed now to

//! An Optimizer that
//! 1. Optimistically calls `optimize_mir_constant`.
//! 2. Then, if we haven't arrived at a constant, it calls `optimize_mir_local`, i.e., the
//!    logical optimizer.
//!
//! This is used for `CREATE VIEW` statements and in various other situations where no physical
//! optimization is needed, such as for `INSERT` statements.

pub fn constant_insert_optimizer(_ctx: &mut TransformCtx) -> Self {
let transforms: Vec<Box<dyn Transform>> = vec![
Box::new(NormalizeLets::new(false)),
Box::new(canonicalization::ReduceScalars),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow why ReduceScalars is here. It doesn't do relation constant folding, but the other two do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case a user writes something like

insert into t values (1+1+1+1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is now included in fold_constants_fixpoint().)

@ggevay
Copy link
Contributor Author

ggevay commented Dec 2, 2024

Ok, figured out that it was still showing a regression locally for me because I was running it with a larger scale (as mentioned by Dennis), and the query was so large that FoldConstants gave up due to hitting FOLD_CONSTANTS_LIMIT, so then I was still running the full logical optimizer. I think there is no need to put more effort into optimizing such large queries, so I'll just leave that as is.

@ggevay
Copy link
Contributor Author

ggevay commented Dec 2, 2024

Thanks for the review @frankmcsherry! I've addressed the comments.

I'm running Nightly again: https://buildkite.com/materialize/nightly/builds/10589
The result I'm expecting is that the speedup will be slightly less now that we are calling fold_constants_fixpoint() instead of a custom sequence of transforms, but I think we'll still have a speedup.

@ggevay
Copy link
Contributor Author

ggevay commented Dec 2, 2024

Locally, scale=4 is showing me 10-20% speedup.

@frankmcsherry
Copy link
Contributor

frankmcsherry commented Dec 2, 2024

So this fixes the performance for smaller inserts (1000 rows) but not with many rows (10000 rows), interesting.

@antiguru observed that with large constants, much of the time of FoldConstants is in trying to determine if there is a column that forms a unique key.

Copy link
Contributor

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, still some comments, but I think this is better going forward. We can further optimize / specialize if we need, but I approve of starting from here, vs a more deeply specialized implementation!

Comment on lines 10 to 11
//! Optimizer implementation for `CREATE VIEW` statements and other misc statements, such as
//! `INSERT`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the same to me at the moment. Not the most important detail, but also perhaps something wasn't pushed / refreshed?

@@ -565,11 +567,11 @@ pub fn fold_constants_fixpoint() -> Fixpoint {
name: "fold_constants_fixpoint",
limit: 100,
transforms: vec![
Box::new(NormalizeLets::new(false)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this moved. We end up with a thing that may not be normalized, but I don't see why we want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I thought that the order doesn't matter from the point of view of which result we arrive at, because it's a fixpoint loop. But this may not be true if FoldConstants and NormalizeLets fight with each other for some reason. I don't see a reason why they would fight, but anyhow I'll change the order back, because we are more robust that way, because normalization is important.

(The reason why I changed the order is that in the INSERT constant scenario (and probably in other similar cases) this would settle down with one less NormalizeLets run with the new order. But this is not so important.)

@ggevay
Copy link
Contributor Author

ggevay commented Dec 2, 2024

So this fixes the performance for smaller inserts (1000 rows) but not with many rows (10000 rows), interesting.

@antiguru observed that with large constants, much of the time of FoldConstants is in trying to determine if there is a column that forms a unique key.

Actually, this had a different reason, see above. (Btw., I think it worked for 10000 and didn't work for 100000, for the reason explained above.)

@ggevay
Copy link
Contributor Author

ggevay commented Dec 2, 2024

Addressed the remaining comments.

Again, new Nightly run, hopefully the last: https://buildkite.com/materialize/nightly/builds/10591

Will merge when (enough of) Nightly completes.

@ggevay
Copy link
Contributor Author

ggevay commented Dec 3, 2024

Nightly is fine:

@ggevay ggevay merged commit 8874d3e into MaterializeInc:main Dec 3, 2024
221 of 224 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ADAPTER Topics related to the ADAPTER layer A-CLUSTER Topics related to the CLUSTER layer A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants