-
Notifications
You must be signed in to change notification settings - Fork 518
db: allow ingesting local SSTs with values in blob files #5620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
Previously when ingesting local ssts we provided just the string paths as a slice. Now that we want to ingest tables with blob files, we introduce a new type that will track local sst paths and extra metadata like their blob file paths.
Allow ingesting local ssts and their associated blob files. Note that we do not validate blob value handles within a table. Each blob file is assumed to be valid for and fully referenced by the SST.
jbowens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbowens reviewed 2 of 2 files at r1, 5 of 5 files at r2, 7 of 9 files at r3, all commit messages.
Reviewable status: 10 of 12 files reviewed, 10 unresolved discussions (waiting on @xinhaoz and @xxmplus)
ingest_with_blobs.go line 38 at r3 (raw file):
// IngestLocal ingests the provided local sstables into the DB. // If a valid exciseSpan is provided, the ingested sstables will have any keys // in that span excised (removed) during ingestion.
the sentence describing the exciseSpan behavior seems off. if exciseSpan is provided, any committed keys within the DB that fall in that span will be deleted atomically with ingestion. if the sstables being ingested have keys that fall within the excise span, those keys will not be removed
ingest.go line 487 at r3 (raw file):
} // ingestBlobFilesForTable creates table's blob references from the provided blob file paths.
nit: I think we should explicitly state that this function will update tableMeta.BlobReferences appropriately and return the BlobFileMetadata for the corresponding referenced files
ingest.go line 490 at r3 (raw file):
// // The following invariants are expected to hold: // - The provided blob file paths are ordered by by their index in the table's blob references
nit: s/by by/by/
ingest.go line 494 at r3 (raw file):
// value, so the order must be maintained when mapping to the table's blob references. // - Each blob file is fully referenced by the table. func ingestBlobFilesForTable(
nit: maybe call this something like constructBlobFileMetadataForIngestedTable? the existing name was reading to me like it was performing the ingestion itself
ingest.go line 508 at r3 (raw file):
} // Read the the blob reference index block to determine the size of values
nit: s/the the/the/
ingest.go line 517 at r3 (raw file):
defer h.Release() if !h.Valid() { return nil, errors.New("pebble: table has invalid blob reference index block")
I think we should be able to omit this or at least convert it to an invariant assertion. The contract should be that err==nil implies h.Valid()
ingest.go line 580 at r3 (raw file):
Size: max(uint64(fileSize), 1), ValueSize: valueSize, CreationTime: uint64(time.Now().Unix()),
nit: uint64(opts.private.timeNow().Unix())
ingest.go line 849 at r3 (raw file):
// Link blob files first. for blobInd, bf := range localMetas[i].blobFiles { blobPaths := localMetas[i].local.BlobPaths
nit: maybe move blobPaths := outside this inner for loop, since it's per-table state
ingest.go line 1699 at r3 (raw file):
// the tables referenced in the MANIFEST, but not present in the provider. if err := d.objProvider.Sync(); err != nil { return IngestOperationStats{}, err
nit: this was an existing issue before this change, but I'm realizing that this error case and the one above don't ingestCleanup before returning. how about leaving a TODO to clean up the error handling in here?
ingest_test.go line 381 at r3 (raw file):
for _, val := range arg.Vals { // Split by comma to handle comma-separated blob paths paths := strings.Split(val, ",")
nit: I think you could do
blobPaths = append(blobPaths, strings.FieldsFunc(val, func(r rune) bool {
return r == ',' || unicode.IsSpace(r)
})
valsep: extract SSTBlobWriter test kv handling
db: add LocalSST struct containing metadata for local ingested files
Previously when ingesting local ssts we provided just the string paths as a
slice. Now that we want to ingest tables with blob files, we introduce a new
type that will track local sst paths and extra metadata like their blob file
paths.
db: allow ingesting local SSTs with values in blob files
Allow ingesting local ssts and their associated blob files. Note that we do not
validate blob value handles. Each blob file is assumed to be fully referenced by
the SST.