Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[no-release-notes] go: nbs,chunks: Have AddTableFilesToManifest check all dependencies are present. #8824

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Feb 5, 2025

No description provided.

@@ -196,7 +195,7 @@ func trueUpBackingManifest(ctx context.Context, root hash.Hash, backing *journal
if err != nil {
return manifestContents{}, err
} else if !ok {
return manifestContents{}, fmt.Errorf("manifest not found when opening chunk journal")
return manifestContents{}, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed the semantics here so that we can open a journal file without a manifest already existing.

We do this as part of Clone in some cases, where we copy the journal from a remote into our local store, and then we AddTableFilesToManifest to add it to our manifest. The new sanity checking logic wants to open the journal and read its chunks before we agree to add it to the store. This change allows us to do that.

The change below to ParseIfExists stitches it together, by continuing to go to the backing manifest as long as we have never loaded an actual manifest here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth a comment to that effect

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c07900c ok 5937457
version total_tests
c07900c 5937457
correctness_percentage
100.0

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as far as I understand this layer. Might want somebody who has been in here more recently take a pass as well.

@@ -654,6 +654,18 @@ func (rs *RemoteChunkStore) AddTableFiles(ctx context.Context, req *remotesapi.A
return &remotesapi.AddTableFilesResponse{Success: true}, nil
}

func (rs *RemoteChunkStore) getAddrs(version string) chunks.GetAddrsCurry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a method doc

getAddrs := func(c chunks.Chunk) chunks.GetAddrsCb {
return func(ctx context.Context, ins hash.HashSet, filter chunks.PendingRefExists) error {
return walkAddrs(c, func(h hash.Hash, _ bool) error {
if !filter(h) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too difficult to change but this is the opposite of the usual filter idiom

@@ -196,7 +195,7 @@ func trueUpBackingManifest(ctx context.Context, root hash.Hash, backing *journal
if err != nil {
return manifestContents{}, err
} else if !ok {
return manifestContents{}, fmt.Errorf("manifest not found when opening chunk journal")
return manifestContents{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth a comment to that effect

}

func (nbs *NomsBlockStore) updateManifestAddFiles(ctx context.Context, updates map[hash.Hash]uint32, appendixOption *ManifestAppendixOption) (mi ManifestInfo, err error) {
func (nbs *NomsBlockStore) updateManifestAddFiles(ctx context.Context, updates map[hash.Hash]uint32, appendixOption *ManifestAppendixOption, gcGen *hash.Hash, sources chunkSourceSet) (mi ManifestInfo, gcGenDifferent bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth defining a type alias for gcGen? Probably useful to distinguish from other random hashes

}
eg, ctx := errgroup.WithContext(ctx)
var mu sync.Mutex
for h, c := range files {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unfamiliar with the semantics here so it's not obvious what value c represents or how it feeds into the Open call below

@reltuk reltuk merged commit 6255f50 into main Feb 11, 2025
34 of 35 checks passed
Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I don't fully understand the different clone paths and the process for incorporating a server's journal file into another's. Like does this just prevent cloning a corrupt journal file? Also do we just dump the journal file into noms/ to access in the future or do we reformat it?

}
}

// For each chunk in sources, walk all its addresses with |getAddrs| and ensure that none are missing from a call to |refCheck|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so basically, the chunkstores are self-consistent and do not include any intermediate nodes with missing refs?

}
// If these sources get added to the store, they will get cloned.
// Either way, we want to close these instances when we are done.
defer chunkSources.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the newest go version do this correctly? or does it just try to close the last very n-times?

// store and the store was in a consistent state when
// we added them.
//
// We currently do not take any chunk dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC is irrelevant here because we're checking the self-consistency of whole file, which additively will be consistent w/ GC results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants