-
Notifications
You must be signed in to change notification settings - Fork 5
Use Piece CIDv2 #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note: a few procedural blockers still remain on the implementation path (not the spec itself). Links are all captured in filecoin-project/go-fil-commcid#5 (comment), context is comment above it. |
im supportive of this and think future porep services should also use v2 |
Yes please, Storacha use v2 as the primary piece CID identifying aggregated pieces and segments - they need to match up. |
Worth looking into: can we remove the size if using piece cid v2? |
Here is what we will do moving to commp v2 PDPVerifier upgradeSince we're going to be interpreting everything as commp v2 and need to do some state migrating to achieve this lets simplify everything and store digest and size only. The benefits of storing cid prefix are limited as a future with different tree shapes is far off and likely involves a adding in v2 proofsets anyway. There is limited value in mix and matching hash types within a proofset as proofset creation is cheap.
To migrate we will add a new mapping rootdigests uint256 => uint256 for proofset digests. Read the old rootcids rewrite the rootdigests with the digests only. If this turns out to not be feasible we'll just add all of the commpv2 prefix data to the root cids structures. We should have a helper function that can be called to return the commpv2 given a rootid which will write the varint into a prefix curio upgradecurio will send just the digest in add roots pdptool data prep should start using commp v2 for data preparation and across its api. retrieval should happen over commp v2. Though probably we want to enable both for existing sector commps |
Ok the subtlety here is that we measure data size in terms of leaves of commp trees on chain. That requirement comes out of the structure of proofs and its the right thing to measure. But it forces the chain to lose information about datasize by padding all inputs along 127 byte boundaries. And commpv2 is more flexible than this. You can specify the difference between encoding 0xC0FFEE00 and 0xCOFFEE by specifying how much padding is added to end. And for all sorts of data such byte strings will have different meanings We could add an onchain true size table to record pre fr32 data encoding sizes but at that point I'd rather just add the commpv2 cid directly. So we will pass in commpv2 from offchain and the contract will validate that the digest specified padding size is <= (full tree - (leafCount*32)) * (127/128) with equality in the case that the original data is 127 byte aligned. Since we'd keep a generic cid structure in this case I'm thinking that we should only pay this overhead for a commpv2 and let users specify commpv1 if they want. During add we'll check for commpv2 digest byte and if we find it then we will check padding length. |
Two other nice things about this approach
|
Adin pointed out that PDP isn't using PieceCIDv2, only v1. This is a significant issue because it makes it impossible to fully verify the data from the CID (there's no way to tell how many of the trailing zeros are "padding").
At the moment, we pull the length from the contract, but this defeats many of the benefits of content addressing. The CID should be sufficient by itself.
Luckily, PieceCIDv1 & v2 have identical digests, just different prefixes. So this is mostly an interfaces/display issue. Furthermore, the SP doesn't really care so we can focus on the client interfaces.
The text was updated successfully, but these errors were encountered: