Releases: dolthub/dolt
Releases · dolthub/dolt
1.47.1
Merged PRs
dolt
- 8770: Speed up fetch when there are many tags which haven't changed
A user remarked thatdolt pull
took 2 hours to pull changes. This was the result of wasting time for every tag which had not changed. This change alters the tag iteration code to defers the loading of metadata until it's actually required. Testing against user takes less that 1min now. - 8753: make autoincrement tracker load async
- 8752: go/store/{nbs,types}: GC: Move the reference walk from types to nbs.
Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage.
This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format LD_1, but it's actually not necessary in DOLT. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now. - 8747: go/libraries/doltcore/sqle/dprocedures: dolt_gc.go: Retry canceling running queries when waiting for safepoint establishment.
This allowscall dolt_gc()
to more quickly and realibly establish a safepoint if the call to safepointF() races with a new query beginning and being registered for the connection in the process list.
go-mysql-server
- 2820: Don't close ctx prematurely for single row results
User reported cancelled context error which is caused by prematurely closing the iterator when the ctx is still needed. - 2819: Don't force a table rewrite when appending extra values to the end of an enum.
Adding extra strings to the end of an enum type doesn't change the values for any of the existing strings. A table rewrite isn't necessary in this case.
If a specific table implementation does need to be rewritten when an enum type changes this way, they can still implementShouldRewriteTable
in order to force a rewrite anyway. - 2817: Use vector index when the
SELECT
cause has a projection.
Due to some overly strict pattern matching in the vector index selection, we weren't always using the index when there was a projection involved: we were only applying the index in the presence of aTopN
node, but we also weren't generatingTopN
nodes in the case we had aLimit -> Project -> Sort
node structure.
I was hoping that dolthub/go-mysql-server#2813 would fix this, and I suspect there's improvements to GMS that would make this unnecessary. But for now, we should allow the pattern matching inreplaceIdxOrderByDistance
to apply a vector index lookup in this case.
Closed Issues
- 8769:
last_insert_id
gives 0 when 0 is explicitly specified for anauto_increment
primary key in an insertion
1.47.0
Vector indexes and search are supported in this release.
Merged PRs
dolt
- 8749: Allow importing parquet fields containing repeated elements.
NOTE: This still needs tests. I'm looking for a good tool for generating parquet. We can't usedolt table export
to generate the parquet because we can't generate composite types that way.
This PR adds support for importing specific composite parquet types into Dolt. Specifically, we're now able to import a compose parquet field if:- There is exactly one leaf column in the field.
- There is at most one repeated tag in the field.
We flatten these composite values into a single primitive value (if there are no repeated tags) or an array of primitive values (if there's exactly one repeated tag.)
There's more work to be done here (multidimensional arrays, objects, etc), but this allows us to import vector embedding stored in parquet files.
Why do we flatten the type?
We want to be able to import parquet files from HuggingFace, and store embedding sequences as arrays. Embedding sequences in HuggingFace exports are an optional field containing a single repeated child field, which itself contains a single optional field containing the sequence element. Flattening this into a single array is more usable and doesn't lose any data. - 8686: Proximity Map implementation with support for incremental edits.
Based on #8408, now with additional functionality for incremental changes to indexes.
This is a large-scale PR merging several features into main, all designed for supporting vector indexes.Vector Index Nodes
1defec9 adds a new message/node type: the vector index node. This message stores a node in a Merkle tree index whose structure is based on some distance measure in a multi-dimensional space: at each level, keys are arranged such that a key is closer to its parent key than any other key in the parent node.
One consequence of this design is that it's not possible to put a hard limit on the number of keys contained in each node. We can control the mean node size, but there's always a non-zero chance that a node will be large enough to break our usual encoding scheme (which uses 16-bit ints to store message offsets). To address this, the vector index node uses 32-bit ints to store message offsets instead of the 16 bits used by other node types.Proximity Map
A ProximityMap is a new implementation of Dolt's Map, a data structure built on Merkle trees that maps key bytestrings to value bytestrings. The ProximityMap is backed by a tree of vector index nodes, allowing it to perform an approximate nearest neighbor search.
Proximity Maps resemble other Prolly Maps, but have the following invariants:- Each key must be convertible to a vector. Typically, the key is a val.Tuple, and the vector is the first value in that tuple.
- The keys are arranged in the tree such that, for each of a key's parent keys (the keys that appear on the path from the root to the key), the key is closer to that parent key than any of the parent key's siblings.
- The keys in a node are sorted lexographically (note that this is not necessarily the same ordering as the tuple that the key represents), except for the first key which matches its direct parent.
Notably, while the keys of an individual node are sorted, walking all of a vector indexes keys in standard iteration order will not be sorted.
28b7065 and 6b91635 contain the bulk of the ProximityMap implementation.
The bulk of the changes are in these three commits. Each of the other commits is a smaller self-contained change necessary to support vector indexes.
go-mysql-server
- 2817: Use vector index when the
SELECT
cause has a projection.
Due to some overly strict pattern matching in the vector index selection, we weren't always using the index when there was a projection involved: we were only applying the index in the presence of aTopN
node, but we also weren't generatingTopN
nodes in the case we had aLimit -> Project -> Sort
node structure.
I was hoping that dolthub/go-mysql-server#2813 would fix this, and I suspect there's improvements to GMS that would make this unnecessary. But for now, we should allow the pattern matching inreplaceIdxOrderByDistance
to apply a vector index lookup in this case. - 2816: Allow using vector index when the queried vector is provided in a user variable.
Right now, vector indexes are very narrowly applied. One of the inputs to the DISTANCE function needs to be a constant. Before we required it to be a Literal expression, but UserVar expressions should also work. - 2797: Persist and load superusers
Previously, superusers were persisted to disk, but never loaded back again when the database was restarted. This essentially made all superusers ephemeral, since they only lasted for the duration of a SQL server process.
This change loads persisted superusers from disk, and also adds a new function to create ephemeral superusers that do not get persisted to disk.
This also includes a fix for the event scheduler to use a privileged account so that it can load events from all databases.
Closed Issues
- 8734: Can't delete remote branch refs that no longer exist in origin
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.89 | 0.68 | 0.36 |
groupby_scan | 12.98 | 17.32 | 1.33 |
index_join | 1.44 | 2.48 | 1.72 |
index_join_scan | 1.42 | 1.44 | 1.01 |
index_scan | 34.33 | 30.81 | 0.9 |
oltp_point_select | 0.18 | 0.27 | 1.5 |
oltp_read_only | 3.49 | 5.37 | 1.54 |
select_random_points | 0.34 | 0.6 | 1.76 |
select_random_ranges | 0.37 | 0.63 | 1.7 |
table_scan | 34.95 | 33.12 | 0.95 |
types_table_scan | 75.82 | 118.92 | 1.57 |
reads_mean_multiplier | 1.3 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 9.06 | 6.32 | 0.7 |
oltp_insert | 4.1 | 3.13 | 0.76 |
oltp_read_write | 9.06 | 11.65 | 1.29 |
oltp_update_index | 4.18 | 3.19 | 0.76 |
oltp_update_non_index | 4.18 | 3.07 | 0.73 |
oltp_write_only | 5.77 | 6.32 | 1.1 |
types_delete_insert | 8.43 | 6.67 | 0.79 |
writes_mean_multiplier | 0.88 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 96.45 | 39.71 | 2.43 |
tpcc_tps_multiplier | 2.43 |
Overall Mean Multiple | 1.54 |
---|
1.46.0
Backwards incompatible changes in this release:
- The default root superuser is now persisted to the privileges database and is scoped to localhost, instead of %. Previously, the root superuser only existed when no other accounts had been created. Creating accounts, then restarting the sql-server would cause the root superuser to disappear. The
root@localhost
superuser is now created the first time a sql-server is started, as the privileges database is initialized. - For Docker customers – note that the default root superuser is now scoped to localhost, instead of any host. This change is made for security and to better match MySQL's default security posture. To connect to a Dolt sql-server from outside the container the sql-server is running on, you need to supply the
-e DOLT_ROOT_HOST=<host>
argument. For more details and examples, see the dolthub/sql-server Docker readme, our Docker documentation, or our blog post covering this change.
Per Dolt’s versioning policy, this is a minor version bump because these changes may impact existing applications. Please reach out to us on GitHub or Discord if you have questions or need help with any of these changes.
Merged PRs
dolt
- 8746: Allow
dolt sql
to always log in as theroot
superuser
From the command line, when a sql-server is not running,dolt sql
implicitly uses theroot
account to log, but if theroot
account exists with a password,dolt sql
will fail to log in. Since the user has access to the host and the database data directory, we should allowdolt sql
to log into the SQL shell, even if theroot
user has a password set. This change also makes this behavior match when a sql-server is running, and we allow superuser login through the__dolt_local_user__
account (which only exists while a sql-server is running). - 8745: Add --prune option to dolt_pull procedure
Expose in CLI and test too.
See: #8734 - 8742: Don't panic when attempting to update workspace table
Previously a panic was very likely if any update to dolt_workspace_* involved a schema change. This change restricts the updates to the workspace tables only in cases where the schemas have not changed. - 8740: /go/libraries/doltcore/sql/dsess: parallelize sql.NewDatabase work
- 8690: Initialize persisted
root
superuser on SQL server startup
Previously, Dolt would only create aroot
superuser on sql-server startup when no other user accounts had been created. This resulted in a behavior where users would rundolt sql-server
, create user accounts, then the next time they restart the sql-server, theroot
account would no longer be present. This behavior has surprised several customers (see #5759) and is different from MySQL's behavior, which creates a persistentroot
superuser as part of initialization.
This change modifies this behavior so that aroot
superuser is created, and persisted, the first time a SQL server is started for a database, unless the--skip-root-user-initialization
flag is specified, or if an ephemeral super user is requested with the--user
option. Subsequent runs ofdolt sql-server
do not automatically create theroot
superuser – only the first timedolt sql-server
is started when there is no privileges database yet, will trigger theroot
user to be created and the privileges database to be initialized
Internally, this is implemented by detecting the presence of any user account and privilege data stored to disk (by default, in the.doltcfg/privileges.db
file). When no user account and privilege data exists, theroot
superuser initialization logic will run. This means theprivileges.db
data is now always created on the first run ofdolt sql-server
, even if the data is empty.
As part of this change, theroot
superuser is now scoped tolocalhost
, instead of%
(i.e. any host). This improves the default security posture of a Dolt sql-server and better aligns with MySQL's behavior. Customers who rely on using theroot
account to connect from non-localhost hosts, will need to either log in and alter theroot
account to allow connections from the hosts they need, or they can specify theDOLT_ROOT_HOST
and/orDOLT_ROOT_PASSWORD
environment variables to override the default host (localhost) and password ("") for theroot
account when it is initialized the first time a sql-server is launched.
One side effect of this change is thatdolt sql -u <user>
may work differently for some uses. Previously, if there was no user account and privilege data persisted to disk yet (i.e. the.doltcfg/privileges.db
file), then users could specify any username and password todolt sql
(e.g.dolt sql -u doesnotexist
) and they would still be logged in – user authentication was ignored since no user account and privilege data existed. Now that the user account and privilege data is always initialized when runningdolt sql-server
, customers may no longer usedolt sql --user <user>
to log in with unknown user accounts. The workaround for this is to simply rundolt sql
without the--user
option, and Dolt will use the default local account.
Fixes: #5759
Depends on: dolthub/go-mysql-server#2797
Related to: dolthub/doltgresql#1113
Documentation updates: dolthub/docs#2460
go-mysql-server
- 2814: [rowexec] full outer join rightIter exhaust
Full join should exhaust right side, not return as soon as we EOF the left iterator.
fixes: #8735 - 2813: [binder] hoist projections in certain cases where we can combine with top-level projection
This is a bit unintuitive, but hoisting projections above sorts in the binder seems to uniformly improve projection pruning because we will always combine it with the top-level return projection.
fixes: #8736 - 2812: Fix cte naming conflict
fixes: #8724
Distinct CTE references need unique column and table ids. - 2811: Reset BytesBuffer after each rowBatch
Once we spool a batch of rows to client, there's no reason to keep them in memory.
Fixes #8718 - 2797: Persist and load superusers
Previously, superusers were persisted to disk, but never loaded back again when the database was restarted. This essentially made all superusers ephemeral, since they only lasted for the duration of a SQL server process.
This change loads persisted superusers from disk, and also adds a new function to create ephemeral superusers that do not get persisted to disk.
This also includes a fix for the event scheduler to use a privileged account so that it can load events from all databases.
vitess
- 394: parse more partition options in
ALTER TABLE
statements
parses more partition options as no-ops
fixes: #8744 - 393: fix
starting by
andterminated by
order
thestarting by
andterminated by
clauses inload data
statements can appear in any order and any number of times.
Closed Issues
1.45.6
Merged PRs
dolt
- 8730: Make show command more resilient when resolving references
Currently the show command can print internal objects, which requires a local environment. This goes against the sql migration expectations that there is no environment. This change only makes the situation less bad. Splitting out the admin operations into another command is the right approach.
Fixes: #8727 - 8726: Replace min/max helpers with built-in min/max
We can use the built-inmin
andmax
functions since Go 1.21.
Reference: https://go.dev/ref/spec#Min_and_max - 8719: Replace min/max helpers with built-in min/max
We can use the built-inmin
andmax
functions since Go 1.21.
Reference: https://go.dev/ref/spec#Min_and_max - 8644: Generate config.yaml when running sql-server
If sql-server is ran without a specified config file and there is no config.yaml in the database directory, one will be generated.
The rules for how the default config.yaml file is generated are as follows:- If a field has a value defined by the sql-server execution (such as through CLI args), then that value will be used.
- If a field has no set value but has a default value, then that default will be used.
- If a field has no set value and no default, a commented-out line setting the field to null will be included as a placeholder.
Part of #7980
go-mysql-server
- 2811: Reset BytesBuffer after each rowBatch
Once we spool a batch of rows to client, there's no reason to keep them in memory.
Fixes #8718
Closed Issues
- 8736: Sort/Alias dependency conflict preventing prune
- 8735: [Bug]
FULL OUTER JOIN
not commutative, and not giving correct results - 8724: [Bug] Incorrect SUM calculation in CTE with correlated subquery
- 8727: \show in SQL shell panics when arg isn't a commit
- 8718: mydumper OOMs in deterministic fashion for large database
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.89 | 0.65 | 0.34 |
groupby_scan | 13.22 | 17.32 | 1.31 |
index_join | 1.44 | 2.43 | 1.69 |
index_join_scan | 1.42 | 1.44 | 1.01 |
index_scan | 34.33 | 30.81 | 0.9 |
oltp_point_select | 0.18 | 0.26 | 1.44 |
oltp_read_only | 3.49 | 5.37 | 1.54 |
select_random_points | 0.37 | 0.6 | 1.62 |
select_random_ranges | 0.4 | 0.63 | 1.57 |
table_scan | 34.33 | 33.12 | 0.96 |
types_table_scan | 74.46 | 114.72 | 1.54 |
reads_mean_multiplier | 1.27 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.9 | 6.21 | 0.7 |
oltp_insert | 4.1 | 3.07 | 0.75 |
oltp_read_write | 9.06 | 11.45 | 1.26 |
oltp_update_index | 4.18 | 3.13 | 0.75 |
oltp_update_non_index | 4.18 | 3.07 | 0.73 |
oltp_write_only | 5.67 | 6.32 | 1.11 |
types_delete_insert | 8.43 | 6.55 | 0.78 |
writes_mean_multiplier | 0.87 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 95.72 | 39.79 | 2.41 |
tpcc_tps_multiplier | 2.41 |
Overall Mean Multiple | 1.52 |
---|
1.45.5
Merged PRs
dolt
- 8725: More information about types in int conversions
Closed Issues
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.93 | 0.67 | 0.35 |
groupby_scan | 13.22 | 17.32 | 1.31 |
index_join | 1.47 | 2.48 | 1.69 |
index_join_scan | 1.44 | 1.47 | 1.02 |
index_scan | 34.33 | 30.81 | 0.9 |
oltp_point_select | 0.18 | 0.27 | 1.5 |
oltp_read_only | 3.49 | 5.37 | 1.54 |
select_random_points | 0.34 | 0.6 | 1.76 |
select_random_ranges | 0.37 | 0.63 | 1.7 |
table_scan | 34.33 | 33.12 | 0.96 |
types_table_scan | 75.82 | 114.72 | 1.51 |
reads_mean_multiplier | 1.29 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.9 | 6.32 | 0.71 |
oltp_insert | 4.1 | 3.13 | 0.76 |
oltp_read_write | 8.9 | 11.45 | 1.29 |
oltp_update_index | 4.18 | 3.19 | 0.76 |
oltp_update_non_index | 4.18 | 3.07 | 0.73 |
oltp_write_only | 5.67 | 6.32 | 1.11 |
types_delete_insert | 8.43 | 6.67 | 0.79 |
writes_mean_multiplier | 0.88 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 96.35 | 39.77 | 2.42 |
tpcc_tps_multiplier | 2.42 |
Overall Mean Multiple | 1.53 |
---|
1.45.4
Merged PRs
dolt
- 8723: If a JSON document contains strings that can't fit in a single chunk, use the naive Blob chunker instead of the smart JSON chunker.
The JSON chunker never creates a chunk boundary inside of a string.
Originally, this PR added functionality to allow the JSON chunker to split JSON document inside a string. This was supposed to be safe and backwards compatible, because older versions of Dolt reading documents written by newer versions of Dolt are supposed to fall back on ignoring JSON document metadata if they don't understand it and treat the document like a blob.
However, tests revealed that older clients were not checking for this in enough places and would hang when trying to read documents written with this fix. This PR also contains fixes to check the JSON metadata in more places... but this doesn't do anything for existing Dolt servers running older versions.
So instead, this PR detects when a document contains strings that exceed some limit, and instead the writer falls back on writing the document as a plain blob without metadata. The limit is currently 32KB, but can be raised in the future.
I chose to keep the logic for splitting JSON documents inside a string, although the chunker doesn't currently use it, since we may decide to enable it in the future.
Closed Issues
1.45.3
Merged PRs
dolt
- 8722: Add "dolt_optimize_json" system variable.
When set to 0, Dolt will write Json documents to storage as simple blobs instead of path-indexed trees.
This is useful as a workaround to a current issue where the JSON chunker won't chunk in the middle of large string literals, resulting in larger-than-expected chunks. - 8721: Updating journal to allow chunk records larger than 1MB
Also adds tests for JSON cases that triggered these larger chunks - 8720: unhide
dolt ci
commands - 8713: Allow storing/reading 32 bit offsets in nodes.
This change allows future node messages to use 32 bit offsets in their encoding.
This increases the side of the Node struct, but given that each Node is much smaller than the message that it backs, this isn't really a memory concern.
Currently, none of the node types use this, so this change shouldn't have any immediate effect on observable behavior. - 8708: dolt bootstrap refactor
This change alters the way we resolve the data-dir and initialized dolt processes. It originated from discovering that the tmp dirreplace
test at startup was leaving a test file in the current working directory. Long story short - the resolution of data-dir was too late in the process startup when sql-server was starting.
External behavior which will change:- Specifying
data_dir
in sql-server config file will correctly test the ability to rename files on the same partition, and override the TMPDIR environment variable when necessary, using $DATADIR/tmp when the user specified directory is on the wrong partition - Fail to start the server when the data-dir is specified multiple times. This will result in server startup failure for existing deployments which "work" (we choose the data-dir deterministically now, but not in the expected precedence)
Fixes: #8498
- Specifying
- 8668: Bump golang.org/x/crypto from 0.23.0 to 0.31.0 in /go
Bumps golang.org/x/crypto from 0.23.0 to 0.31.0.Commits
b4f1988
ssh: make the public key cache a 1-entry FIFO cache7042ebc
openpgp/clearsign: just use rand.Reader in tests3e90321
go.mod: update golang.org/x dependencies8c4e668
x509roots/fallback: update bundle6018723
go.mod: update golang.org/x dependencies71ed71b
README: don't recommend go get750a45f
sha3: add MarshalBinary, AppendBinary, and UnmarshalBinary36b1725
sha3: avoid trailing permutation80ea76e
sha3: fix padding for long cSHAKE parametersc17aa50
sha3: avoid buffer copy- Additional commits viewable in compare view
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/crypto&package-manager=go_modules&previous-version=0.23.0&new-version=0.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dolthub/dolt/network/alerts).
Closed Issues
- 7512: Dolt binlog Provider Support
- 1992:
dolt branch -d
can't delete remote branches - 1492:
dolt checkout
should support a commit + table argument - 6207:
fetch
way slower thanclone
- 8548: Dolt reset should stage working set
- 8585: Tool dolphie needs
@@admin_version
system variable to return a numeric type. - 8635: "AS OF" doesn't work with partial commit hashes, trying to improvise this crashes Dolt
- 8712: I'm pushing to self-hosting (DoltLab error occurred)
- 8592: Make Dolt work with mydumper
- 8498: tmpDir doesn't seem to be configurable
1.45.2
Merged PRs
dolt
go-mysql-server
- 2802: exempt processlist column renaming through aliases
needed for dolphie to work; extension of dolthub/go-mysql-server#2764 - 2800: Pass metrics server listener to DefaultProtocolListenerFunc for doltgres metrics
vitess
- 393: fix
starting by
andterminated by
order
thestarting by
andterminated by
clauses inload data
statements can appear in any order and any number of times. - 392: [sqltypes] no value buffer leakage
Closed Issues
- 8706: LOAD DATA INFILE syntax error on different order of params 'starting by' and 'terminated by'
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.89 | 0.62 | 0.33 |
groupby_scan | 13.46 | 16.71 | 1.24 |
index_join | 1.44 | 2.3 | 1.6 |
index_join_scan | 1.42 | 1.44 | 1.01 |
index_scan | 34.33 | 30.26 | 0.88 |
oltp_point_select | 0.18 | 0.27 | 1.5 |
oltp_read_only | 3.49 | 5.28 | 1.51 |
select_random_points | 0.33 | 0.59 | 1.79 |
select_random_ranges | 0.37 | 0.62 | 1.68 |
table_scan | 34.33 | 33.12 | 0.96 |
types_table_scan | 74.46 | 108.68 | 1.46 |
reads_mean_multiplier | 1.27 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.9 | 6.21 | 0.7 |
oltp_insert | 4.1 | 3.07 | 0.75 |
oltp_read_write | 8.9 | 11.45 | 1.29 |
oltp_update_index | 4.18 | 3.13 | 0.75 |
oltp_update_non_index | 4.18 | 3.07 | 0.73 |
oltp_write_only | 5.67 | 6.21 | 1.1 |
types_delete_insert | 8.28 | 6.55 | 0.79 |
writes_mean_multiplier | 0.87 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 96.01 | 40.05 | 2.4 |
tpcc_tps_multiplier | 2.4 |
Overall Mean Multiple | 1.51 |
---|
1.45.1
Merged PRs
dolt
- 8699: [sqle] Fix diff table merge join bugs
Inappropriately using kv merge join, in several ways. No diff table support both for kvexec and diff table indexes aren't sorted, so default merge also fails. Test suite was also being skipped.
GMS side here: dolthub/go-mysql-server#2803
fixes: #8700 - 8698: Replace
cespare/xxhash
withcespare/xxhash/v2
Currently, we are using two versions of the same package for xxHash:
https://github.com/dolthub/dolt/blob/d98baafd3e8248a9818e21442f4dfbdeffe78ac4/go/go.mod#L56-L57
github.com/cespare/xxhash/v2
is the latest version, which includes bug fixes and improvements. This PR updates the codebase to replacegithub.com/cespare/xxhash
withgithub.com/cespare/xxhash/v2
.
No breaking changes, see https://go.dev/play/p/ZXuwERoBlEi. - 8691: cache charset bump
- 8684: [stats] stats table name sensitivity tests
Fix bugs related to table casing, loading deleted tables, and making sure we're using the appropriate branch root when updating statistics.
go-mysql-server
- 2803: [memo] merge joins must use globally sorted indexes
- 2802: exempt processlist column renaming through aliases
needed for dolphie to work; extension of dolthub/go-mysql-server#2764 - 2799: pool wire write buffer
BytesBuffer
is a class that lets us avoid most allocations for spooling values to wire. Notably, the object is responsible for doubling the backing array size when appropriate, and aGrow(n int)
interface is necessary to track when this should happen. Letting the runtime do all of this would be preferable, but the runtime doubles based on slice size, and the refactors required to make that workable are more invasive. We pay for 2 mallocs on doubling, because the first one is never big enough. Not callingGrow
after allocing, or growing by too small of length compared to the allocations used will stomp previously written memory.
As long as we track bytes used with theGrow
interface this works smoothly and shaves ~30% off of tablescans.
perf here: #8693 - 2798: cache session charset
perf: #8691 - 2796: apply table projections through
Distinct
nodes
We weren't pruning table columns when there was a distinct clause over the projections, this resulted the deserialization of every column, even if they weren't going to make it to the result. This is bad for performance, especially if the unread columns are ofTEXT
,LONGTEXT
, 'BLOB,
LONGBLOB` type as those are stored out of band, and take longer to deserialize.
fixes: #8689 - 2795: allow using function as table function
vitess
- 390: Minor bug fixes for
caching_sha2_password
auth logic
For accounts without passwords, we need to account for the client sending the null byte when the server re-requests the client auth data, and then skip theAuthMoreDataPacket
, andCachingSha2FastAuth
packets. Otherwise themysql
client errors with "Malformed packet".
handleConnectionError
is used to report stats about failed connection attempts, but wasn't being called in the correct spot. The previous spot was over counting failed connection attempts, since it was called as part of the auth renegotiation flow. It has been moved to be called whenever we return an error or exit the function without a successful connection.
Closed Issues
- 8700: Panic in join against diff table
- 8689: Prune columns from select distinct
- 8688: P
- 2790: Any plan to make a new patch release?
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.89 | 0.61 | 0.32 |
groupby_scan | 13.22 | 16.12 | 1.22 |
index_join | 1.47 | 2.3 | 1.56 |
index_join_scan | 1.42 | 1.39 | 0.98 |
index_scan | 34.33 | 30.26 | 0.88 |
oltp_point_select | 0.18 | 0.26 | 1.44 |
oltp_read_only | 3.43 | 5.18 | 1.51 |
select_random_points | 0.34 | 0.57 | 1.68 |
select_random_ranges | 0.37 | 0.61 | 1.65 |
table_scan | 34.33 | 32.53 | 0.95 |
types_table_scan | 74.46 | 104.84 | 1.41 |
reads_mean_multiplier | 1.24 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 9.06 | 6.21 | 0.69 |
oltp_insert | 4.1 | 3.07 | 0.75 |
oltp_read_write | 8.9 | 11.24 | 1.26 |
oltp_update_index | 4.18 | 3.13 | 0.75 |
oltp_update_non_index | 4.18 | 3.02 | 0.72 |
oltp_write_only | 5.67 | 6.21 | 1.1 |
types_delete_insert | 8.43 | 6.55 | 0.78 |
writes_mean_multiplier | 0.86 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 96.04 | 40.4 | 2.38 |
tpcc_tps_multiplier | 2.38 |
Overall Mean Multiple | 1.49 |
---|
1.45.0
Backwards incompatible change in this release:
- This release has a small behavior change to the
dolt_diff_$table
results. Previously changes to the schema of the table, in particular primary key changes, resulted in only the history of the table which was related to the most recent schema. Now thedolt_diff_$table
system table will make a best effort to include more history for the table even if we can't perfectly map schema changes.
Per Dolt’s versioning policy, this is a minor version bump because these changes may impact existing applications. Please reach out to us on GitHub or Discord if you have questions or need help with any of these changes.
Merged PRs
dolt
- 8685: update TableFunction
- 8631: Give a little more information in dolt_diff_* when there is a pk change
This change makes the dolt_diff_* system table a little more forgiving when schema changes occur that we can kind of map from one commit to the next. In the case of the issue, adding a primary key to a key keyless table. This doesn't work in both directions though - if you can't map the schema, we stop walking history (same as before).
Minor bump required due to behavior of the dolt_diff_* table changing are a result of this change.
Fixes: #8625
go-mysql-server
- 2795: allow using function as table function
- 2794: Bump go-icu-regex
Incorporates the fix from here:
Closed Issues
- 8625: dolt_diff_* returns empty set for tables altered to add a PK after creating using CREATE TABLE ... AS SELECT
- 8683:
dolt table import
does not understand a schema file with a primary key defined separately from the column - 8665: Panic on dolt_diff_* with generated column
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.93 | 0.62 | 0.32 |
groupby_scan | 13.46 | 16.41 | 1.22 |
index_join | 1.47 | 2.26 | 1.54 |
index_join_scan | 1.42 | 1.47 | 1.04 |
index_scan | 34.33 | 46.63 | 1.36 |
oltp_point_select | 0.18 | 0.27 | 1.5 |
oltp_read_only | 3.49 | 5.37 | 1.54 |
select_random_points | 0.33 | 0.6 | 1.82 |
select_random_ranges | 0.37 | 0.62 | 1.68 |
table_scan | 34.33 | 46.63 | 1.36 |
types_table_scan | 74.46 | 123.28 | 1.66 |
reads_mean_multiplier | 1.37 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 8.9 | 6.21 | 0.7 |
oltp_insert | 4.1 | 3.07 | 0.75 |
oltp_read_write | 8.9 | 11.45 | 1.29 |
oltp_update_index | 4.18 | 3.13 | 0.75 |
oltp_update_non_index | 4.18 | 3.07 | 0.73 |
oltp_write_only | 5.77 | 6.21 | 1.08 |
types_delete_insert | 8.43 | 6.55 | 0.78 |
writes_mean_multiplier | 0.87 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 95.58 | 40.6 | 2.35 |
tpcc_tps_multiplier | 2.35 |
Overall Mean Multiple | 1.53 |
---|