-
Notifications
You must be signed in to change notification settings - Fork 3.9k
backfill: handle transaction retry for vector index backfill #144328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball)
pkg/sql/rowexec/indexbackfiller.go
line 197 at r1 (raw file):
} // Initialize the tmpEntry. This will store the input entry that we are encoding
nit: mention here that this allows us to preserve the initial "template" indexEntry across txn retries
pkg/sql/rowexec/indexbackfiller.go
line 207 at r1 (raw file):
tmpEntry.Value.RawBytes = tmpEntry.Value.RawBytes[:0] tmpEntry.Key = append(tmpEntry.Key, indexEntry.Key...) tmpEntry.Value.RawBytes = append(tmpEntry.Value.RawBytes, indexEntry.Value.RawBytes...)
super nit:
tmpEntry.Key = append(tmpEntry.Key[:0], indexEntry.Key...)
tmpEntry.Value.RawBytes = append(tmpEntry.Value.RawBytes[0:], indexEntry.Value.RawBytes...)
pkg/sql/backfill/backfill.go
line 523 at r1 (raw file):
outputEntry.Key = outputEntry.Key[:0] outputEntry.Key = append(outputEntry.Key, vih.indexPrefix...)
super nit: similar to below:
outputEntry.Key = append(outputEntry.Key[:0], vih.indexPrefix...)
Is it possible to add tests that detect and regress this issue? |
80bfbcb
to
4fc6cce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mw5h)
Previously, when backfilling a vector index, a transaction retry would cause backfill failure because the vector index backfill writer would overwrite the values read by the backfill reader with what it wanted to push down to KV so as to avoid new allocations. This worked well so long as there were no transaction retries but would fail to re-encode the index entry on a retry because the writer lost access to the unquantized vector. The quantized vector cannot be reused on a transaction retry because fixups may cause the target partition to change. This patch creates a scratch rowenc.IndexEntry in the backfill helper to store the input vector entry. Before attempting to write the entry, we copy the input IndexEntry to the scratch entry and use that to re-encode the vector, which is still modified in place to limit new allocations. Additionally, this patch switches the writer from using CPut() to CPutAllowingIfNotExists() so that if the backfill job restarts, we don't see the partially written index and fail due to duplicate keys. Informs: cockroachdb#143107 Release note: None
Previously, when backfilling a vector index, a transaction retry would cause backfill failure because the vector index backfill writer would overwrite the values read by the backfill reader with what it wanted to push down to KV so as to avoid new allocations. This worked well so long as there were no transaction retries but would fail to re-encode the index entry on a retry because the writer lost access to the unquantized vector. The quantized vector cannot be reused on a transaction retry because fixups may cause the target partition to change.
This patch creates a scratch rowenc.IndexEntry in the backfill helper to store the input vector entry. Before attempting to write the entry, we copy the input IndexEntry to the scratch entry and use that to re-encode the vector, which is still modified in place to limit new allocations.
Additionally, this patch switches the writer from using CPut() to CPutAllowingIfNotExists() so that if the backfill job restarts, we don't see the partially written index and fail due to duplicate keys.
Informs: #143107
Release note: None