18 Jan 00:14

github-actions

cfb5078

1.47.1 Latest

Latest

Merged PRs

dolt

8770: Speed up fetch when there are many tags which haven't changed
A user remarked that dolt pull took 2 hours to pull changes. This was the result of wasting time for every tag which had not changed. This change alters the tag iteration code to defers the loading of metadata until it's actually required. Testing against user takes less that 1min now.
8753: make autoincrement tracker load async
8752: go/store/{nbs,types}: GC: Move the reference walk from types to nbs.
Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage.
This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format LD_1, but it's actually not necessary in DOLT. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.
8747: go/libraries/doltcore/sqle/dprocedures: dolt_gc.go: Retry canceling running queries when waiting for safepoint establishment.
This allows call dolt_gc() to more quickly and realibly establish a safepoint if the call to safepointF() races with a new query beginning and being registered for the connection in the process list.

go-mysql-server

2820: Don't close ctx prematurely for single row results
User reported cancelled context error which is caused by prematurely closing the iterator when the ctx is still needed.
2819: Don't force a table rewrite when appending extra values to the end of an enum.
Adding extra strings to the end of an enum type doesn't change the values for any of the existing strings. A table rewrite isn't necessary in this case.
If a specific table implementation does need to be rewritten when an enum type changes this way, they can still implement ShouldRewriteTable in order to force a rewrite anyway.
2817: Use vector index when the SELECT cause has a projection.
Due to some overly strict pattern matching in the vector index selection, we weren't always using the index when there was a projection involved: we were only applying the index in the presence of a TopN node, but we also weren't generating TopN nodes in the case we had a Limit -> Project -> Sort node structure.
I was hoping that dolthub/go-mysql-server#2813 would fix this, and I suspect there's improvements to GMS that would make this unnecessary. But for now, we should allow the pattern matching in replaceIdxOrderByDistance to apply a vector index lookup in this case.

Closed Issues

8769: last_insert_id gives 0 when 0 is explicitly specified for an auto_increment primary key in an insertion

Assets 10

16 Jan 03:09

github-actions

v1.47.0

30258b9

1.47.0

Vector indexes and search are supported in this release.

Merged PRs

dolt

8749: Allow importing parquet fields containing repeated elements.
NOTE: This still needs tests. I'm looking for a good tool for generating parquet. We can't use dolt table export to generate the parquet because we can't generate composite types that way.
This PR adds support for importing specific composite parquet types into Dolt. Specifically, we're now able to import a compose parquet field if:
- There is exactly one leaf column in the field.
- There is at most one repeated tag in the field.
  We flatten these composite values into a single primitive value (if there are no repeated tags) or an array of primitive values (if there's exactly one repeated tag.)
  There's more work to be done here (multidimensional arrays, objects, etc), but this allows us to import vector embedding stored in parquet files.
Why do we flatten the type?
We want to be able to import parquet files from HuggingFace, and store embedding sequences as arrays. Embedding sequences in HuggingFace exports are an optional field containing a single repeated child field, which itself contains a single optional field containing the sequence element. Flattening this into a single array is more usable and doesn't lose any data.
8686: Proximity Map implementation with support for incremental edits.
Based on #8408, now with additional functionality for incremental changes to indexes.
This is a large-scale PR merging several features into main, all designed for supporting vector indexes.
Vector Index Nodes
1defec9 adds a new message/node type: the vector index node. This message stores a node in a Merkle tree index whose structure is based on some distance measure in a multi-dimensional space: at each level, keys are arranged such that a key is closer to its parent key than any other key in the parent node.
One consequence of this design is that it's not possible to put a hard limit on the number of keys contained in each node. We can control the mean node size, but there's always a non-zero chance that a node will be large enough to break our usual encoding scheme (which uses 16-bit ints to store message offsets). To address this, the vector index node uses 32-bit ints to store message offsets instead of the 16 bits used by other node types.
Proximity Map
A ProximityMap is a new implementation of Dolt's Map, a data structure built on Merkle trees that maps key bytestrings to value bytestrings. The ProximityMap is backed by a tree of vector index nodes, allowing it to perform an approximate nearest neighbor search.
Proximity Maps resemble other Prolly Maps, but have the following invariants:
- Each key must be convertible to a vector. Typically, the key is a val.Tuple, and the vector is the first value in that tuple.
- The keys are arranged in the tree such that, for each of a key's parent keys (the keys that appear on the path from the root to the key), the key is closer to that parent key than any of the parent key's siblings.
- The keys in a node are sorted lexographically (note that this is not necessarily the same ordering as the tuple that the key represents), except for the first key which matches its direct parent.
  Notably, while the keys of an individual node are sorted, walking all of a vector indexes keys in standard iteration order will not be sorted.
  28b7065 and 6b91635 contain the bulk of the ProximityMap implementation.
  The bulk of the changes are in these three commits. Each of the other commits is a smaller self-contained change necessary to support vector indexes.

go-mysql-server

2817: Use vector index when the SELECT cause has a projection.
Due to some overly strict pattern matching in the vector index selection, we weren't always using the index when there was a projection involved: we were only applying the index in the presence of a TopN node, but we also weren't generating TopN nodes in the case we had a Limit -> Project -> Sort node structure.
I was hoping that dolthub/go-mysql-server#2813 would fix this, and I suspect there's improvements to GMS that would make this unnecessary. But for now, we should allow the pattern matching in replaceIdxOrderByDistance to apply a vector index lookup in this case.
2816: Allow using vector index when the queried vector is provided in a user variable.
Right now, vector indexes are very narrowly applied. One of the inputs to the DISTANCE function needs to be a constant. Before we required it to be a Literal expression, but UserVar expressions should also work.
2797: Persist and load superusers
Previously, superusers were persisted to disk, but never loaded back again when the database was restarted. This essentially made all superusers ephemeral, since they only lasted for the duration of a SQL server process.
This change loads persisted superusers from disk, and also adds a new function to create ephemeral superusers that do not get persisted to disk.
This also includes a fix for the event scheduler to use a privileged account so that it can load events from all databases.

Closed Issues

8734: Can't delete remote branch refs that no longer exist in origin

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.89	0.68	0.36
groupby_scan	12.98	17.32	1.33
index_join	1.44	2.48	1.72
index_join_scan	1.42	1.44	1.01
index_scan	34.33	30.81	0.9
oltp_point_select	0.18	0.27	1.5
oltp_read_only	3.49	5.37	1.54
select_random_points	0.34	0.6	1.76
select_random_ranges	0.37	0.63	1.7
table_scan	34.95	33.12	0.95
types_table_scan	75.82	118.92	1.57
reads_mean_multiplier			1.3

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	9.06	6.32	0.7
oltp_insert	4.1	3.13	0.76
oltp_read_write	9.06	11.65	1.29
oltp_update_index	4.18	3.19	0.76
oltp_update_non_index	4.18	3.07	0.73
oltp_write_only	5.77	6.32	1.1
types_delete_insert	8.43	6.67	0.79
writes_mean_multiplier			0.88

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	96.45	39.71	2.43
tpcc_tps_multiplier			2.43

Overall Mean Multiple	1.54

Assets 10

15 Jan 19:49

github-actions

v1.46.0

fcc8f3e

1.46.0

Backwards incompatible changes in this release:

The default root superuser is now persisted to the privileges database and is scoped to localhost, instead of %. Previously, the root superuser only existed when no other accounts had been created. Creating accounts, then restarting the sql-server would cause the root superuser to disappear. The root@localhost superuser is now created the first time a sql-server is started, as the privileges database is initialized.
For Docker customers – note that the default root superuser is now scoped to localhost, instead of any host. This change is made for security and to better match MySQL's default security posture. To connect to a Dolt sql-server from outside the container the sql-server is running on, you need to supply the -e DOLT_ROOT_HOST=<host> argument. For more details and examples, see the dolthub/sql-server Docker readme, our Docker documentation, or our blog post covering this change.

Per Dolt’s versioning policy, this is a minor version bump because these changes may impact existing applications. Please reach out to us on GitHub or Discord if you have questions or need help with any of these changes.

Merged PRs

dolt

8746: Allow dolt sql to always log in as the root superuser
From the command line, when a sql-server is not running, dolt sql implicitly uses the root account to log, but if the root account exists with a password, dolt sql will fail to log in. Since the user has access to the host and the database data directory, we should allow dolt sql to log into the SQL shell, even if the root user has a password set. This change also makes this behavior match when a sql-server is running, and we allow superuser login through the __dolt_local_user__ account (which only exists while a sql-server is running).
8745: Add --prune option to dolt_pull procedure
Expose in CLI and test too.
See: #8734
8742: Don't panic when attempting to update workspace table
Previously a panic was very likely if any update to dolt_workspace_* involved a schema change. This change restricts the updates to the workspace tables only in cases where the schemas have not changed.
8740: /go/libraries/doltcore/sql/dsess: parallelize sql.NewDatabase work
8690: Initialize persisted root superuser on SQL server startup
Previously, Dolt would only create a root superuser on sql-server startup when no other user accounts had been created. This resulted in a behavior where users would run dolt sql-server, create user accounts, then the next time they restart the sql-server, the root account would no longer be present. This behavior has surprised several customers (see #5759) and is different from MySQL's behavior, which creates a persistent root superuser as part of initialization.
This change modifies this behavior so that a root superuser is created, and persisted, the first time a SQL server is started for a database, unless the --skip-root-user-initialization flag is specified, or if an ephemeral super user is requested with the --user option. Subsequent runs of dolt sql-server do not automatically create the root superuser – only the first time dolt sql-server is started when there is no privileges database yet, will trigger the root user to be created and the privileges database to be initialized
Internally, this is implemented by detecting the presence of any user account and privilege data stored to disk (by default, in the .doltcfg/privileges.db file). When no user account and privilege data exists, the root superuser initialization logic will run. This means the privileges.db data is now always created on the first run of dolt sql-server, even if the data is empty.
As part of this change, the root superuser is now scoped to localhost, instead of % (i.e. any host). This improves the default security posture of a Dolt sql-server and better aligns with MySQL's behavior. Customers who rely on using the root account to connect from non-localhost hosts, will need to either log in and alter the root account to allow connections from the hosts they need, or they can specify the DOLT_ROOT_HOST and/or DOLT_ROOT_PASSWORD environment variables to override the default host (localhost) and password ("") for the root account when it is initialized the first time a sql-server is launched.
One side effect of this change is that dolt sql -u <user> may work differently for some uses. Previously, if there was no user account and privilege data persisted to disk yet (i.e. the .doltcfg/privileges.db file), then users could specify any username and password to dolt sql (e.g. dolt sql -u doesnotexist) and they would still be logged in – user authentication was ignored since no user account and privilege data existed. Now that the user account and privilege data is always initialized when running dolt sql-server, customers may no longer use dolt sql --user <user> to log in with unknown user accounts. The workaround for this is to simply run dolt sql without the --user option, and Dolt will use the default local account.
Fixes: #5759
Depends on: dolthub/go-mysql-server#2797
Related to: dolthub/doltgresql#1113
Documentation updates: dolthub/docs#2460

go-mysql-server

2814: [rowexec] full outer join rightIter exhaust
Full join should exhaust right side, not return as soon as we EOF the left iterator.
fixes: #8735
2813: [binder] hoist projections in certain cases where we can combine with top-level projection
This is a bit unintuitive, but hoisting projections above sorts in the binder seems to uniformly improve projection pruning because we will always combine it with the top-level return projection.
fixes: #8736
2812: Fix cte naming conflict
fixes: #8724
Distinct CTE references need unique column and table ids.
2811: Reset BytesBuffer after each rowBatch
Once we spool a batch of rows to client, there's no reason to keep them in memory.
Fixes #8718
2797: Persist and load superusers
Previously, superusers were persisted to disk, but never loaded back again when the database was restarted. This essentially made all superusers ephemeral, since they only lasted for the duration of a SQL server process.
This change loads persisted superusers from disk, and also adds a new function to create ephemeral superusers that do not get persisted to disk.
This also includes a fix for the event scheduler to use a privileged account so that it can load events from all databases.

vitess

394: parse more partition options in ALTER TABLE statements
parses more partition options as no-ops
fixes: #8744
393: fix starting by and terminated by order
the starting by and terminated by clauses in load data statements can appear in any order and any number of times.

Closed Issues

5759: Dolt's disappearing root user is confusing
8744: Parser support for adding/removing partition

Assets 10

14 Jan 16:59

github-actions

v1.45.6

1514be7

1.45.6

Merged PRs

dolt

8730: Make show command more resilient when resolving references
Currently the show command can print internal objects, which requires a local environment. This goes against the sql migration expectations that there is no environment. This change only makes the situation less bad. Splitting out the admin operations into another command is the right approach.
Fixes: #8727
8726: Replace min/max helpers with built-in min/max
We can use the built-in min and max functions since Go 1.21.
Reference: https://go.dev/ref/spec#Min_and_max
8719: Replace min/max helpers with built-in min/max
We can use the built-in min and max functions since Go 1.21.
Reference: https://go.dev/ref/spec#Min_and_max
8644: Generate config.yaml when running sql-server
If sql-server is ran without a specified config file and there is no config.yaml in the database directory, one will be generated.
The rules for how the default config.yaml file is generated are as follows:
- If a field has a value defined by the sql-server execution (such as through CLI args), then that value will be used.
- If a field has no set value but has a default value, then that default will be used.
- If a field has no set value and no default, a commented-out line setting the field to null will be included as a placeholder.
  Part of #7980

go-mysql-server

2811: Reset BytesBuffer after each rowBatch
Once we spool a batch of rows to client, there's no reason to keep them in memory.
Fixes #8718

Closed Issues

8736: Sort/Alias dependency conflict preventing prune
8735: [Bug] FULL OUTER JOIN not commutative, and not giving correct results
8724: [Bug] Incorrect SUM calculation in CTE with correlated subquery
8727: \show in SQL shell panics when arg isn't a commit
8718: mydumper OOMs in deterministic fashion for large database

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.89	0.65	0.34
groupby_scan	13.22	17.32	1.31
index_join	1.44	2.43	1.69
index_join_scan	1.42	1.44	1.01
index_scan	34.33	30.81	0.9
oltp_point_select	0.18	0.26	1.44
oltp_read_only	3.49	5.37	1.54
select_random_points	0.37	0.6	1.62
select_random_ranges	0.4	0.63	1.57
table_scan	34.33	33.12	0.96
types_table_scan	74.46	114.72	1.54
reads_mean_multiplier			1.27

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.9	6.21	0.7
oltp_insert	4.1	3.07	0.75
oltp_read_write	9.06	11.45	1.26
oltp_update_index	4.18	3.13	0.75
oltp_update_non_index	4.18	3.07	0.73
oltp_write_only	5.67	6.32	1.11
types_delete_insert	8.43	6.55	0.78
writes_mean_multiplier			0.87

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	95.72	39.79	2.41
tpcc_tps_multiplier			2.41

Overall Mean Multiple	1.52

Assets 10

09 Jan 18:19

github-actions

v1.45.5

80ad9a2

1.45.5

Merged PRs

dolt

8725: More information about types in int conversions

Closed Issues

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.93	0.67	0.35
groupby_scan	13.22	17.32	1.31
index_join	1.47	2.48	1.69
index_join_scan	1.44	1.47	1.02
index_scan	34.33	30.81	0.9
oltp_point_select	0.18	0.27	1.5
oltp_read_only	3.49	5.37	1.54
select_random_points	0.34	0.6	1.76
select_random_ranges	0.37	0.63	1.7
table_scan	34.33	33.12	0.96
types_table_scan	75.82	114.72	1.51
reads_mean_multiplier			1.29

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.9	6.32	0.71
oltp_insert	4.1	3.13	0.76
oltp_read_write	8.9	11.45	1.29
oltp_update_index	4.18	3.19	0.76
oltp_update_non_index	4.18	3.07	0.73
oltp_write_only	5.67	6.32	1.11
types_delete_insert	8.43	6.67	0.79
writes_mean_multiplier			0.88

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	96.35	39.77	2.42
tpcc_tps_multiplier			2.42

Overall Mean Multiple	1.53

Assets 10

09 Jan 05:31

github-actions

v1.45.4

05ce06e

1.45.4

Merged PRs

dolt

8723: If a JSON document contains strings that can't fit in a single chunk, use the naive Blob chunker instead of the smart JSON chunker.
The JSON chunker never creates a chunk boundary inside of a string.
Originally, this PR added functionality to allow the JSON chunker to split JSON document inside a string. This was supposed to be safe and backwards compatible, because older versions of Dolt reading documents written by newer versions of Dolt are supposed to fall back on ignoring JSON document metadata if they don't understand it and treat the document like a blob.
However, tests revealed that older clients were not checking for this in enough places and would hang when trying to read documents written with this fix. This PR also contains fixes to check the JSON metadata in more places... but this doesn't do anything for existing Dolt servers running older versions.
So instead, this PR detects when a document contains strings that exceed some limit, and instead the writer falls back on writing the document as a plain blob without metadata. The limit is currently 32KB, but can be raised in the future.
I chose to keep the logic for splitting JSON documents inside a string, although the chunker doesn't currently use it, since we may decide to enable it in the future.

Closed Issues

Assets 10

08 Jan 05:56

github-actions

v1.45.3

071cd77

1.45.3

Merged PRs

dolt

8722: Add "dolt_optimize_json" system variable.
When set to 0, Dolt will write Json documents to storage as simple blobs instead of path-indexed trees.
This is useful as a workaround to a current issue where the JSON chunker won't chunk in the middle of large string literals, resulting in larger-than-expected chunks.
8721: Updating journal to allow chunk records larger than 1MB
Also adds tests for JSON cases that triggered these larger chunks
8720: unhide dolt ci commands
8713: Allow storing/reading 32 bit offsets in nodes.
This change allows future node messages to use 32 bit offsets in their encoding.
This increases the side of the Node struct, but given that each Node is much smaller than the message that it backs, this isn't really a memory concern.
Currently, none of the node types use this, so this change shouldn't have any immediate effect on observable behavior.
8708: dolt bootstrap refactor
This change alters the way we resolve the data-dir and initialized dolt processes. It originated from discovering that the tmp dir replace test at startup was leaving a test file in the current working directory. Long story short - the resolution of data-dir was too late in the process startup when sql-server was starting.
External behavior which will change:
1. Specifying data_dir in sql-server config file will correctly test the ability to rename files on the same partition, and override the TMPDIR environment variable when necessary, using $DATADIR/tmp when the user specified directory is on the wrong partition
2. Fail to start the server when the data-dir is specified multiple times. This will result in server startup failure for existing deployments which "work" (we choose the data-dir deterministically now, but not in the expected precedence)
  Fixes: #8498
8668: Bump golang.org/x/crypto from 0.23.0 to 0.31.0 in /go
Bumps golang.org/x/crypto from 0.23.0 to 0.31.0.
Commits
- b4f1988 ssh: make the public key cache a 1-entry FIFO cache
- 7042ebc openpgp/clearsign: just use rand.Reader in tests
- 3e90321 go.mod: update golang.org/x dependencies
- 8c4e668 x509roots/fallback: update bundle
- 6018723 go.mod: update golang.org/x dependencies
- 71ed71b README: don't recommend go get
- 750a45f sha3: add MarshalBinary, AppendBinary, and UnmarshalBinary
- 36b1725 sha3: avoid trailing permutation
- 80ea76e sha3: fix padding for long cSHAKE parameters
- c17aa50 sha3: avoid buffer copy
- Additional commits viewable in compare view
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/crypto&package-manager=go_modules&previous-version=0.23.0&new-version=0.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dolthub/dolt/network/alerts).

Closed Issues

7512: Dolt binlog Provider Support
1992: dolt branch -d can't delete remote branches
1492: dolt checkout should support a commit + table argument
6207: fetch way slower than clone
8548: Dolt reset should stage working set
8585: Tool dolphie needs @@admin_version system variable to return a numeric type.
8635: "AS OF" doesn't work with partial commit hashes, trying to improvise this crashes Dolt
8712: I'm pushing to self-hosting (DoltLab error occurred)
8592: Make Dolt work with mydumper
8498: tmpDir doesn't seem to be configurable

Assets 10

02 Jan 19:21

github-actions

v1.45.2

34f7a86

1.45.2

Merged PRs

dolt

go-mysql-server

2802: exempt processlist column renaming through aliases
needed for dolphie to work; extension of dolthub/go-mysql-server#2764
2800: Pass metrics server listener to DefaultProtocolListenerFunc for doltgres metrics

vitess

393: fix starting by and terminated by order
the starting by and terminated by clauses in load data statements can appear in any order and any number of times.
392: [sqltypes] no value buffer leakage

Closed Issues

8706: LOAD DATA INFILE syntax error on different order of params 'starting by' and 'terminated by'

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.89	0.62	0.33
groupby_scan	13.46	16.71	1.24
index_join	1.44	2.3	1.6
index_join_scan	1.42	1.44	1.01
index_scan	34.33	30.26	0.88
oltp_point_select	0.18	0.27	1.5
oltp_read_only	3.49	5.28	1.51
select_random_points	0.33	0.59	1.79
select_random_ranges	0.37	0.62	1.68
table_scan	34.33	33.12	0.96
types_table_scan	74.46	108.68	1.46
reads_mean_multiplier			1.27

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.9	6.21	0.7
oltp_insert	4.1	3.07	0.75
oltp_read_write	8.9	11.45	1.29
oltp_update_index	4.18	3.13	0.75
oltp_update_non_index	4.18	3.07	0.73
oltp_write_only	5.67	6.21	1.1
types_delete_insert	8.28	6.55	0.79
writes_mean_multiplier			0.87

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	96.01	40.05	2.4
tpcc_tps_multiplier			2.4

Overall Mean Multiple	1.51

Assets 10

26 Dec 22:40

github-actions

v1.45.1

b3417c9

1.45.1

Merged PRs

dolt

8699: [sqle] Fix diff table merge join bugs
Inappropriately using kv merge join, in several ways. No diff table support both for kvexec and diff table indexes aren't sorted, so default merge also fails. Test suite was also being skipped.
GMS side here: dolthub/go-mysql-server#2803
fixes: #8700
8698: Replace cespare/xxhash with cespare/xxhash/v2
Currently, we are using two versions of the same package for xxHash:
https://github.com/dolthub/dolt/blob/d98baafd3e8248a9818e21442f4dfbdeffe78ac4/go/go.mod#L56-L57
github.com/cespare/xxhash/v2 is the latest version, which includes bug fixes and improvements. This PR updates the codebase to replace github.com/cespare/xxhash with github.com/cespare/xxhash/v2.
No breaking changes, see https://go.dev/play/p/ZXuwERoBlEi.
8691: cache charset bump
8684: [stats] stats table name sensitivity tests
Fix bugs related to table casing, loading deleted tables, and making sure we're using the appropriate branch root when updating statistics.

go-mysql-server

2803: [memo] merge joins must use globally sorted indexes
2802: exempt processlist column renaming through aliases
needed for dolphie to work; extension of dolthub/go-mysql-server#2764
2799: pool wire write buffer
BytesBuffer is a class that lets us avoid most allocations for spooling values to wire. Notably, the object is responsible for doubling the backing array size when appropriate, and a Grow(n int) interface is necessary to track when this should happen. Letting the runtime do all of this would be preferable, but the runtime doubles based on slice size, and the refactors required to make that workable are more invasive. We pay for 2 mallocs on doubling, because the first one is never big enough. Not calling Grow after allocing, or growing by too small of length compared to the allocations used will stomp previously written memory.
As long as we track bytes used with the Grow interface this works smoothly and shaves ~30% off of tablescans.
perf here: #8693
2798: cache session charset
perf: #8691
2796: apply table projections through Distinct nodes
We weren't pruning table columns when there was a distinct clause over the projections, this resulted the deserialization of every column, even if they weren't going to make it to the result. This is bad for performance, especially if the unread columns are of TEXT, LONGTEXT, 'BLOB, LONGBLOB` type as those are stored out of band, and take longer to deserialize.
fixes: #8689
2795: allow using function as table function

vitess

390: Minor bug fixes for caching_sha2_password auth logic
For accounts without passwords, we need to account for the client sending the null byte when the server re-requests the client auth data, and then skip the AuthMoreDataPacket, and CachingSha2FastAuth packets. Otherwise the mysql client errors with "Malformed packet".
handleConnectionError is used to report stats about failed connection attempts, but wasn't being called in the correct spot. The previous spot was over counting failed connection attempts, since it was called as part of the auth renegotiation flow. It has been moved to be called whenever we return an error or exit the function without a successful connection.

Closed Issues

8700: Panic in join against diff table
8689: Prune columns from select distinct
8688: P
2790: Any plan to make a new patch release?

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.89	0.61	0.32
groupby_scan	13.22	16.12	1.22
index_join	1.47	2.3	1.56
index_join_scan	1.42	1.39	0.98
index_scan	34.33	30.26	0.88
oltp_point_select	0.18	0.26	1.44
oltp_read_only	3.43	5.18	1.51
select_random_points	0.34	0.57	1.68
select_random_ranges	0.37	0.61	1.65
table_scan	34.33	32.53	0.95
types_table_scan	74.46	104.84	1.41
reads_mean_multiplier			1.24

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	9.06	6.21	0.69
oltp_insert	4.1	3.07	0.75
oltp_read_write	8.9	11.24	1.26
oltp_update_index	4.18	3.13	0.75
oltp_update_non_index	4.18	3.02	0.72
oltp_write_only	5.67	6.21	1.1
types_delete_insert	8.43	6.55	0.78
writes_mean_multiplier			0.86

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	96.04	40.4	2.38
tpcc_tps_multiplier			2.38

Overall Mean Multiple	1.49

Assets 10

17 Dec 22:43

github-actions

v1.45.0

cb6d5b4

1.45.0

Backwards incompatible change in this release:

This release has a small behavior change to the dolt_diff_$table results. Previously changes to the schema of the table, in particular primary key changes, resulted in only the history of the table which was related to the most recent schema. Now the dolt_diff_$table system table will make a best effort to include more history for the table even if we can't perfectly map schema changes.

Merged PRs

dolt

8685: update TableFunction
8631: Give a little more information in dolt_diff_* when there is a pk change
This change makes the dolt_diff_* system table a little more forgiving when schema changes occur that we can kind of map from one commit to the next. In the case of the issue, adding a primary key to a key keyless table. This doesn't work in both directions though - if you can't map the schema, we stop walking history (same as before).
Minor bump required due to behavior of the dolt_diff_* table changing are a result of this change.
Fixes: #8625

go-mysql-server

2795: allow using function as table function
2794: Bump go-icu-regex
Incorporates the fix from here:
- dolthub/go-icu-regex#2

Closed Issues

8625: dolt_diff_* returns empty set for tables altered to add a PK after creating using CREATE TABLE ... AS SELECT
8683: dolt table import does not understand a schema file with a primary key defined separately from the column
8665: Panic on dolt_diff_* with generated column

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.93	0.62	0.32
groupby_scan	13.46	16.41	1.22
index_join	1.47	2.26	1.54
index_join_scan	1.42	1.47	1.04
index_scan	34.33	46.63	1.36
oltp_point_select	0.18	0.27	1.5
oltp_read_only	3.49	5.37	1.54
select_random_points	0.33	0.6	1.82
select_random_ranges	0.37	0.62	1.68
table_scan	34.33	46.63	1.36
types_table_scan	74.46	123.28	1.66
reads_mean_multiplier			1.37

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.9	6.21	0.7
oltp_insert	4.1	3.07	0.75
oltp_read_write	8.9	11.45	1.29
oltp_update_index	4.18	3.13	0.75
oltp_update_non_index	4.18	3.07	0.73
oltp_write_only	5.77	6.21	1.08
types_delete_insert	8.43	6.55	0.78
writes_mean_multiplier			0.87

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	95.58	40.6	2.35
tpcc_tps_multiplier			2.35

Overall Mean Multiple	1.53

Assets 10

Releases: dolthub/dolt

1.47.1

Merged PRs

dolt

go-mysql-server

Closed Issues

1.47.0

Merged PRs

dolt

Why do we flatten the type?

Vector Index Nodes

Proximity Map

go-mysql-server

Closed Issues

Performance

1.46.0

Merged PRs

dolt

go-mysql-server

vitess

Closed Issues

1.45.6

Merged PRs

dolt

go-mysql-server

Closed Issues

Performance

1.45.5

Merged PRs

dolt

Closed Issues

Performance

1.45.4

Merged PRs

dolt

Closed Issues

1.45.3

Merged PRs

dolt

Closed Issues

1.45.2

Merged PRs

dolt

go-mysql-server

vitess

Closed Issues

Performance

1.45.1

Merged PRs

dolt

go-mysql-server

vitess

Closed Issues

Performance

1.45.0

Merged PRs

dolt

go-mysql-server

Closed Issues

Performance