Skip to content

Use Piece CIDv2 #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Stebalien opened this issue Apr 29, 2025 · 8 comments · May be fixed by #161
Open

Use Piece CIDv2 #158

Stebalien opened this issue Apr 29, 2025 · 8 comments · May be fixed by #161
Assignees

Comments

@Stebalien
Copy link
Collaborator

Adin pointed out that PDP isn't using PieceCIDv2, only v1. This is a significant issue because it makes it impossible to fully verify the data from the CID (there's no way to tell how many of the trailing zeros are "padding").

At the moment, we pull the length from the contract, but this defeats many of the benefits of content addressing. The CID should be sufficient by itself.

Luckily, PieceCIDv1 & v2 have identical digests, just different prefixes. So this is mostly an interfaces/display issue. Furthermore, the SP doesn't really care so we can focus on the client interfaces.

@ribasushi
Copy link

Note: a few procedural blockers still remain on the implementation path (not the spec itself). Links are all captured in filecoin-project/go-fil-commcid#5 (comment), context is comment above it.

@jennijuju
Copy link
Contributor

jennijuju commented Apr 29, 2025

im supportive of this and think future porep services should also use v2

@alanshaw
Copy link

Yes please, Storacha use v2 as the primary piece CID identifying aggregated pieces and segments - they need to match up.

@rjan90 rjan90 moved this to 🐱 Todo in PDP Apr 30, 2025
@jennijuju
Copy link
Contributor

Worth looking into: can we remove the size if using piece cid v2?

@alanshaw
Copy link

Yeah, you can downgrade - we do in our console UI:

Image

@ZenGround0
Copy link
Contributor

Here is what we will do moving to commp v2

PDPVerifier upgrade

Since we're going to be interpreting everything as commp v2 and need to do some state migrating to achieve this lets simplify everything and store digest and size only. The benefits of storing cid prefix are limited as a future with different tree shapes is far off and likely involves a adding in v2 proofsets anyway. There is limited value in mix and matching hash types within a proofset as proofset creation is cheap.

  • Add will now take digest and size instead of cid and size
  • Proofset state will now store digest size only. Instead of []cid.Cids (struct {bytes}) we will just use []uint256

To migrate we will add a new mapping rootdigests uint256 => uint256 for proofset digests. Read the old rootcids rewrite the rootdigests with the digests only.

If this turns out to not be feasible we'll just add all of the commpv2 prefix data to the root cids structures.

We should have a helper function that can be called to return the commpv2 given a rootid which will write the varint into a prefix

curio upgrade

curio will send just the digest in add roots

pdptool data prep should start using commp v2 for data preparation and across its api. retrieval should happen over commp v2. Though probably we want to enable both for existing sector commps

@ZenGround0 ZenGround0 linked a pull request May 5, 2025 that will close this issue
@rjan90 rjan90 moved this from 🐱 Todo to 🔎 Awaiting review in PDP May 5, 2025
@ZenGround0
Copy link
Contributor

Ok the subtlety here is that we measure data size in terms of leaves of commp trees on chain. That requirement comes out of the structure of proofs and its the right thing to measure. But it forces the chain to lose information about datasize by padding all inputs along 127 byte boundaries. And commpv2 is more flexible than this. You can specify the difference between encoding 0xC0FFEE00 and 0xCOFFEE by specifying how much padding is added to end. And for all sorts of data such byte strings will have different meanings

We could add an onchain true size table to record pre fr32 data encoding sizes but at that point I'd rather just add the commpv2 cid directly. So we will pass in commpv2 from offchain and the contract will validate that the digest specified padding size is <= (full tree - (leafCount*32)) * (127/128) with equality in the case that the original data is 127 byte aligned.

Since we'd keep a generic cid structure in this case I'm thinking that we should only pay this overhead for a commpv2 and let users specify commpv1 if they want. During add we'll check for commpv2 digest byte and if we find it then we will check padding length.

@ZenGround0
Copy link
Contributor

ZenGround0 commented May 6, 2025

Two other nice things about this approach

  1. no state migration -- we'll grandfather in any invalid commpv2s that have made it on chain. This is accpeptable to me because all significant existing activity has been on calibnet so far. (while
  2. no change to curio. curio can support commpv2s whenever we want but there's no requirement to update code or introspect on versions to succesfully add roots after the upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🔎 Awaiting review
Development

Successfully merging a pull request may close this issue.

5 participants