-
-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache the computation of default columns during three-way merges. #7065
Conversation
00bf27b
to
d1d0c20
Compare
d1d0c20
to
a24aa65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code change for column defaults looks great. Thank you for fixing that one! 🙏
I like the larger perf test, but I'm still very skeptical about the timing issues. The CI envs have a LOT of variance over time in how quickly they run, and they are of course much slower than our dev laptops. It seems like trying to test perf regressions by a set time limit is going to be pretty fragile.
This is still a step in a good direction though! I just think we should keep pushing for a better way to test perf for specific operations. The two ideas that come to mind right now are...
- I still like the idea of having some internal merge stats that tell us how much work was done as part of the merge. It could break things down into a lot of interesting dimensions. It's definitely less direct than just measuring time, and you're right that it involves us measuring for things we know will be expensive, but it still seems like it might be an interesting idea to explore.
- This is likely going to be messy, but we could try to have the perf testing code take a baseline on the hardware it's currently running on, and use that to create some perf modifier that could set an expected range for how fast code should run. I've seen other projects that have used a similar approach, but it's definitely a bit messy.
echo "insert into t(pk) values" > import.sql | ||
for i in {1..100000} | ||
do | ||
echo " ($i)," >> import.sql | ||
done | ||
echo " (104857);" >> import.sql | ||
|
||
dolt sql < import.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) I don't think you need to check in the generated import.sql script. I know I had suggested checking in a sql file to create the table, but since you've got the create_repo function in the BATS, I think that's an even better way to get reproducibility, so I think you can safely remove the checked-in import.sql file since we can easily regenerate the test database from the BATS function you made. You could also trim down some other files, for example branch_control.db shouldn't be needed since we aren't using branch permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to check in the .sql script. Removed.
load $BATS_TEST_DIRNAME/helper/common.bash | ||
|
||
BATS_TEST_TIMEOUT=50 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to document some context about what the intent of the performance.bats file is. It's obvious to you and me right now, but it likely won't be to future code readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
This is a slow operation because it actually generates the CREATE TABLE string for the merged schema and uses yacc to parse it so we can correctly resolve default column expressions.
But we should only need to do this once, not once per row.
This also contains a rough performance regression test. Without this fix, the test should time out on GitHub's CI.