Skip to content

Conversation

@xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Dec 1, 2025

valsep: extract SSTBlobWriter test kv handling

db: add LocalSST struct containing metadata for local ingested files

Previously when ingesting local ssts we provided just the string paths as a
slice. Now that we want to ingest tables with blob files, we introduce a new
type that will track local sst paths and extra metadata like their blob file
paths.

db: allow ingesting local SSTs with values in blob files

Allow ingesting local ssts and their associated blob files. Note that we do not
validate blob value handles. Each blob file is assumed to be fully referenced by
the SST.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@xinhaoz xinhaoz marked this pull request as ready for review December 1, 2025 22:00
@xinhaoz xinhaoz requested a review from a team as a code owner December 1, 2025 22:00
@xinhaoz xinhaoz requested a review from xxmplus December 1, 2025 22:00
@xinhaoz xinhaoz marked this pull request as draft December 1, 2025 22:00
@xinhaoz xinhaoz marked this pull request as ready for review December 1, 2025 22:19
Previously when ingesting local ssts we provided just the string paths as a
slice. Now that we want to ingest tables with blob files, we introduce a new
type that will track local sst paths and extra metadata like their blob file
paths.
Allow ingesting local ssts and their associated blob files. Note that we do not
validate blob value handles within a table. Each blob file is assumed to be
valid for and fully referenced by the SST.
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbowens reviewed 2 of 2 files at r1, 5 of 5 files at r2, 7 of 9 files at r3, all commit messages.
Reviewable status: 10 of 12 files reviewed, 10 unresolved discussions (waiting on @xinhaoz and @xxmplus)


ingest_with_blobs.go line 38 at r3 (raw file):

// IngestLocal ingests the provided local sstables into the DB.
// If a valid exciseSpan is provided, the ingested sstables will have any keys
// in that span excised (removed) during ingestion.

the sentence describing the exciseSpan behavior seems off. if exciseSpan is provided, any committed keys within the DB that fall in that span will be deleted atomically with ingestion. if the sstables being ingested have keys that fall within the excise span, those keys will not be removed


ingest.go line 487 at r3 (raw file):

}

// ingestBlobFilesForTable creates table's blob references from the provided blob file paths.

nit: I think we should explicitly state that this function will update tableMeta.BlobReferences appropriately and return the BlobFileMetadata for the corresponding referenced files


ingest.go line 490 at r3 (raw file):

//
// The following invariants are expected to hold:
//   - The provided blob file paths are ordered by by their index in the table's blob references

nit: s/by by/by/


ingest.go line 494 at r3 (raw file):

//     value, so the order must be maintained when mapping to the table's blob references.
//   - Each blob file is fully referenced by the table.
func ingestBlobFilesForTable(

nit: maybe call this something like constructBlobFileMetadataForIngestedTable? the existing name was reading to me like it was performing the ingestion itself


ingest.go line 508 at r3 (raw file):

	}

	// Read the the blob reference index block to determine the size of values

nit: s/the the/the/


ingest.go line 517 at r3 (raw file):

	defer h.Release()
	if !h.Valid() {
		return nil, errors.New("pebble: table has invalid blob reference index block")

I think we should be able to omit this or at least convert it to an invariant assertion. The contract should be that err==nil implies h.Valid()


ingest.go line 580 at r3 (raw file):

			Size:         max(uint64(fileSize), 1),
			ValueSize:    valueSize,
			CreationTime: uint64(time.Now().Unix()),

nit: uint64(opts.private.timeNow().Unix())


ingest.go line 849 at r3 (raw file):

		// Link blob files first.
		for blobInd, bf := range localMetas[i].blobFiles {
			blobPaths := localMetas[i].local.BlobPaths

nit: maybe move blobPaths := outside this inner for loop, since it's per-table state


ingest.go line 1699 at r3 (raw file):

	// the tables referenced in the MANIFEST, but not present in the provider.
	if err := d.objProvider.Sync(); err != nil {
		return IngestOperationStats{}, err

nit: this was an existing issue before this change, but I'm realizing that this error case and the one above don't ingestCleanup before returning. how about leaving a TODO to clean up the error handling in here?


ingest_test.go line 381 at r3 (raw file):

				for _, val := range arg.Vals {
					// Split by comma to handle comma-separated blob paths
					paths := strings.Split(val, ",")

nit: I think you could do

blobPaths = append(blobPaths, strings.FieldsFunc(val, func(r rune) bool {
    return r == ',' || unicode.IsSpace(r)
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants