-
Notifications
You must be signed in to change notification settings - Fork 110
bmt, param: Introduce SectionHasher interface, implement in bmt #2021
Conversation
janos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed only the technical aspects. I will review functional changes in another round.
zelig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely brilliant.
| //t.GetSection() = make([]byte, sw.secsize) | ||
| //copy(t.GetSection(), section) | ||
| // TODO: Consider whether the section here needs to be copied, maybe we can enforce not change the original slice | ||
| copySection := make([]byte, sw.secsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the copying not part of SetSection then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no member in tree that remembers the section size, so either we must add a member or we must pass it with the function. The latter seems clumsy.
In general I think it's a good idea to introduce as few side-effects in the lower level components as possible; the tree component could be used without any copying, after all.
| n = len(d) | ||
| } | ||
| binary.BigEndian.PutUint64(span, uint64(n)) | ||
| binary.LittleEndian.PutUint64(span, uint64(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why LittleEndian suddenly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember off the top of my head, but at least it's same as in storage/types.go?
| rand.Read(sdata[8:]) | ||
| binary.LittleEndian.PutUint64(sdata[:8], uint64(dataSize)) | ||
| hasher.ResetWithLength(sdata[:8]) | ||
| hasher.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now this is called twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I don't understand what you mean? You mean since we actually construct the hasher then Reset is redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
PR is great the plan also looks great. Make sure there is a way for erasure coding section writer will have access to the child chunk data in order to generate parity data chunks or is there a better way? |
|
Benchmarks are fine after closer inspection. Thanks to @janos for hint on stabilizing benchmark results. |
pradovic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM implementation wise, nice work! Business logic wise, I am not 100% sure as I don't have big experience about this part yet. Left just one minor question.
| rawSha3.Write(input) | ||
| rawSha3Output := rawSha3.Sum(nil) | ||
|
|
||
| sha3FromMakeFunc := MakeHashFunc(SHA3Hash)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌷 I know it's not part of this PR, but why not use constructor for Hasher instead of a func? If the func is needed maybe the builder can be extracted instead of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pradovic The SwarmHash needs the length of the data it represents prepended as an 64-bit value. The BMT hash has this builtin, and we extend the other types with HashWithLength to allow setting the length (SetSpanBytes, see storage/swarmhasher.go)
This PR is part of a series of PRs that introduces an interface that allows chaining of components that receive a data stream and generate hashes and intermediate Merkle-Tree chunks. The individual PR steps will be partitioned from #2022 (branch https://github.com/nolash/swarm/tree/filehasher-avenged ) as follows:
SectionWriter, implement this interface inbmt, makeAsyncHasherstandalone (this PR)AsyncHashertofile/hasherSectionWritersub-component for hashing intermediate Merkle Tree levels.SectionWritercomponent executing the FileHasher algorithmio.ReaderandSectionWriter, and an implementation ofSectionWritercomponent that providesChunkoutput.SectionWriterthat provides encryption, along with a test utilitySectionWriterimplentation of a data cache.bmt.Hasherexports wtrAsyncHasherIntroduce
SectionWriterinterface and implement this interface inbmtThe objectives of this PR are:
bmt.Hasherbmt.Hasherby using only thehash.HashinterfaceAsyncHasherto separate packagestorage.SwarmHashoutside thestoragepackageSectionWriterinterfaceThe interface is defined in the package
/filehash.Hash
Essentially the
FileHasheris a hashing operation. Thus it makes sense that the components can be used through the same interface as other hashing components provided ingolang.SetWriter
Chains
SectionWriterto a subsequentSectionWriter. It should be optional for theSectionWriterto provide chaning. The method is itself chainable.SetSpan
Sets the "span," meaning the amount of data represented by the data written to the
SectionWriter. Eg. the references constituting the data of an intermediate chunk "repesents" more data than the actual data bytes. Forbmt.Hasherthis was previously provided by theResetWithLengthcall, and lack of a separate way of setting the span made it impossible to usebmt.Hasherwith a purehash.Hashinterface.SectionSize
Informs the caller about the underlying
SectionSizeof theSectionWriter. In some cases this will be the same as for the chainedSectionWriter, in some cases theSectionWritermay buffer and/or pad data, and translate theSectionSizeaccordingly.Branches
Informs the caller about the underlying
Branchesa.k.a. branch-factor, with same rationale as forSectionSizeabove.bmtimplementationsNeither
bmtimplementation currently provides any chaining, and will raise errors on calls toSetWriter.bmt.HasherCan now be used as
hash.Hash, where the span is merely calcuated from the amount of bytes written to it. If a different span is needed, theSetSpanmethod can be used.Since the
SetLengthcall has no practical utility forbmt.Hashercurrently, it is ignored.Exports are added to make it possible to move
AsyncHasherto a separate package. Excess exports will be pruned later.bmt.AsyncHasherbmt.AsyncHasheris now ready to be moved to a separate package. It`s left in place for this PR to make it easy to see the changes that were made.WriteIndexedandSumIndexedreplace the originalWriteandSumcalls. It can still be used as abmt.Hasher(and thushash.Hash) transparently by using the usualWriteandSumcalls.storage.SwarmHashResetWithLengthinstorage.SwarmHashinterface has been changed toSetSpanBytes.bmt.Hasherprovides this method, which performs the same function asSetSpanalbeit with 8-byte serialized uint instead.By the way, a bug was unearthed through the reworking of the
bmt, in which the hash result for zero-length data was different betweenRefHasherandbmt.Hasher(but notbmt.AsyncHasher). This has been fixed.