Cache the computation of default columns during three-way merges. #7065

nicktobey · 2023-11-29T20:46:07Z

This is a slow operation because it actually generates the CREATE TABLE string for the merged schema and uses yacc to parse it so we can correctly resolve default column expressions.

But we should only need to do this once, not once per row.

This also contains a rough performance regression test. Without this fix, the test should time out on GitHub's CI.

…ce per row.

fulghum

Code change for column defaults looks great. Thank you for fixing that one! 🙏

I like the larger perf test, but I'm still very skeptical about the timing issues. The CI envs have a LOT of variance over time in how quickly they run, and they are of course much slower than our dev laptops. It seems like trying to test perf regressions by a set time limit is going to be pretty fragile.

This is still a step in a good direction though! I just think we should keep pushing for a better way to test perf for specific operations. The two ideas that come to mind right now are...

I still like the idea of having some internal merge stats that tell us how much work was done as part of the merge. It could break things down into a lot of interesting dimensions. It's definitely less direct than just measuring time, and you're right that it involves us measuring for things we know will be expensive, but it still seems like it might be an interesting idea to explore.
This is likely going to be messy, but we could try to have the perf testing code take a baseline on the hardware it's currently running on, and use that to create some perf modifier that could set an expected range for how fast code should run. I've seen other projects that have used a similar approach, but it's definitely a bit messy.

fulghum · 2023-12-01T20:10:43Z

integration-tests/bats/performance.bats

+    echo "insert into t(pk) values" > import.sql
+    for i in {1..100000}
+    do
+        echo "  ($i)," >> import.sql
+    done
+    echo "  (104857);" >> import.sql
+
+    dolt sql < import.sql


(minor) I don't think you need to check in the generated import.sql script. I know I had suggested checking in a sql file to create the table, but since you've got the create_repo function in the BATS, I think that's an even better way to get reproducibility, so I think you can safely remove the checked-in import.sql file since we can easily regenerate the test database from the BATS function you made. You could also trim down some other files, for example branch_control.db shouldn't be needed since we aren't using branch permissions.

I didn't mean to check in the .sql script. Removed.

fulghum · 2023-12-01T20:11:35Z

integration-tests/bats/performance.bats

+load $BATS_TEST_DIRNAME/helper/common.bash
+
+BATS_TEST_TIMEOUT=50
+


It would be nice to document some context about what the intent of the performance.bats file is. It's obvious to you and me right now, but it likely won't be to future code readers.

Added comments.

Precompute default column values during merge instead of computing on…

3115ab7

…ce per row.

nicktobey force-pushed the nicktobey/perf branch from 00bf27b to d1d0c20 Compare November 30, 2023 16:46

Add initial performance regression tests.

a24aa65

nicktobey force-pushed the nicktobey/perf branch from d1d0c20 to a24aa65 Compare November 30, 2023 17:50

fulghum approved these changes Dec 1, 2023

View reviewed changes

nicktobey added 2 commits December 4, 2023 11:54

Delete unneeded files from test repo.

915cadb

Add comments to BATS test.

c8ba567

nicktobey merged commit e723577 into main Dec 4, 2023
17 checks passed

nicktobey deleted the nicktobey/perf branch December 4, 2023 22:10

BrewTestBot mentioned this pull request Dec 7, 2023

dolt 1.29.1 Homebrew/homebrew-core#156705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache the computation of default columns during three-way merges. #7065

Cache the computation of default columns during three-way merges. #7065

nicktobey commented Nov 29, 2023

fulghum left a comment

fulghum Dec 1, 2023

nicktobey Dec 4, 2023

fulghum Dec 1, 2023

nicktobey Dec 4, 2023

		load $BATS_TEST_DIRNAME/helper/common.bash

		BATS_TEST_TIMEOUT=50

Cache the computation of default columns during three-way merges. #7065

Cache the computation of default columns during three-way merges. #7065

Conversation

nicktobey commented Nov 29, 2023

fulghum left a comment

Choose a reason for hiding this comment

fulghum Dec 1, 2023

Choose a reason for hiding this comment

nicktobey Dec 4, 2023

Choose a reason for hiding this comment

fulghum Dec 1, 2023

Choose a reason for hiding this comment

nicktobey Dec 4, 2023

Choose a reason for hiding this comment