Use Piece CIDv2 #158

Stebalien · 2025-04-29T15:15:24Z

Adin pointed out that PDP isn't using PieceCIDv2, only v1. This is a significant issue because it makes it impossible to fully verify the data from the CID (there's no way to tell how many of the trailing zeros are "padding").

At the moment, we pull the length from the contract, but this defeats many of the benefits of content addressing. The CID should be sufficient by itself.

Luckily, PieceCIDv1 & v2 have identical digests, just different prefixes. So this is mostly an interfaces/display issue. Furthermore, the SP doesn't really care so we can focus on the client interfaces.

ribasushi · 2025-04-29T18:44:04Z

Note: a few procedural blockers still remain on the implementation path (not the spec itself). Links are all captured in filecoin-project/go-fil-commcid#5 (comment), context is comment above it.

jennijuju · 2025-04-29T18:58:34Z

im supportive of this and think future porep services should also use v2

alanshaw · 2025-04-30T08:49:51Z

Yes please, Storacha use v2 as the primary piece CID identifying aggregated pieces and segments - they need to match up.

jennijuju · 2025-04-30T15:05:04Z

Worth looking into: can we remove the size if using piece cid v2?

alanshaw · 2025-04-30T15:52:51Z

Yeah, you can downgrade - we do in our console UI:

ZenGround0 · 2025-04-30T15:54:29Z

Here is what we will do moving to commp v2

PDPVerifier upgrade

Since we're going to be interpreting everything as commp v2 and need to do some state migrating to achieve this lets simplify everything and store digest and size only. The benefits of storing cid prefix are limited as a future with different tree shapes is far off and likely involves a adding in v2 proofsets anyway. There is limited value in mix and matching hash types within a proofset as proofset creation is cheap.

Add will now take digest and size instead of cid and size
Proofset state will now store digest size only. Instead of []cid.Cids (struct {bytes}) we will just use []uint256

To migrate we will add a new mapping rootdigests uint256 => uint256 for proofset digests. Read the old rootcids rewrite the rootdigests with the digests only.

If this turns out to not be feasible we'll just add all of the commpv2 prefix data to the root cids structures.

We should have a helper function that can be called to return the commpv2 given a rootid which will write the varint into a prefix

curio upgrade

curio will send just the digest in add roots

pdptool data prep should start using commp v2 for data preparation and across its api. retrieval should happen over commp v2. Though probably we want to enable both for existing sector commps

ZenGround0 · 2025-05-06T04:11:57Z

Ok the subtlety here is that we measure data size in terms of leaves of commp trees on chain. That requirement comes out of the structure of proofs and its the right thing to measure. But it forces the chain to lose information about datasize by padding all inputs along 127 byte boundaries. And commpv2 is more flexible than this. You can specify the difference between encoding 0xC0FFEE00 and 0xCOFFEE by specifying how much padding is added to end. And for all sorts of data such byte strings will have different meanings

We could add an onchain true size table to record pre fr32 data encoding sizes but at that point I'd rather just add the commpv2 cid directly. So we will pass in commpv2 from offchain and the contract will validate that the digest specified padding size is <= (full tree - (leafCount*32)) * (127/128) with equality in the case that the original data is 127 byte aligned.

Since we'd keep a generic cid structure in this case I'm thinking that we should only pay this overhead for a commpv2 and let users specify commpv1 if they want. During add we'll check for commpv2 digest byte and if we find it then we will check padding length.

ZenGround0 · 2025-05-06T04:17:02Z

Two other nice things about this approach

no state migration -- we'll grandfather in any invalid commpv2s that have made it on chain. This is accpeptable to me because all significant existing activity has been on calibnet so far. (while
no change to curio. curio can support commpv2s whenever we want but there's no requirement to update code or introspect on versions to succesfully add roots after the upgrade

github-project-automation bot added this to PDP Apr 29, 2025

rjan90 moved this to 🐱 Todo in PDP Apr 30, 2025

ZenGround0 linked a pull request May 5, 2025 that will close this issue

Feat/migrate to commpv2 #161

Draft

rjan90 assigned ZenGround0 May 5, 2025

rjan90 moved this from 🐱 Todo to 🔎 Awaiting review in PDP May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Piece CIDv2 #158

Use Piece CIDv2 #158

Stebalien commented Apr 29, 2025

ribasushi commented Apr 29, 2025

jennijuju commented Apr 29, 2025 •

edited

Loading

alanshaw commented Apr 30, 2025

jennijuju commented Apr 30, 2025

alanshaw commented Apr 30, 2025

ZenGround0 commented Apr 30, 2025

ZenGround0 commented May 6, 2025

ZenGround0 commented May 6, 2025 •

edited

Loading

Use Piece CIDv2 #158

Use Piece CIDv2 #158

Comments

Stebalien commented Apr 29, 2025

ribasushi commented Apr 29, 2025

jennijuju commented Apr 29, 2025 • edited Loading

alanshaw commented Apr 30, 2025

jennijuju commented Apr 30, 2025

alanshaw commented Apr 30, 2025

ZenGround0 commented Apr 30, 2025

PDPVerifier upgrade

curio upgrade

ZenGround0 commented May 6, 2025

ZenGround0 commented May 6, 2025 • edited Loading

jennijuju commented Apr 29, 2025 •

edited

Loading

ZenGround0 commented May 6, 2025 •

edited

Loading