spec discussion: shard segments commitment to global history and proofs #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

adlrocha wants to merge 2 commits into main from discussion/segment-proofs

Collaborator

adlrocha commented May 5, 2025

Note: This PR is not meant to be merged

This PR includes a draft spec proposal for the submission and verification of child shard segments into the upper layers of the hierarchy.

The outcome of this discussion should be something that enables the implementation of this part of the protocol.


          spec discussion: shard segments commitment to global history and proofs

0d4418b

adlrocha mentioned this pull request

spec: sharded archiving draft #192

Closed

nazar-pc reviewed

View reviewed changes

Owner

nazar-pc left a comment

Like #220 this is a start, but it doesn't really go into sufficient depth of actually composing a provable verification scheme.

For example I'm really interested in what the actual proof looks like, what information do we need? Turned out that the crucial thing, SuperSegmentProof's contents, is exactly what is missing here right now. I have discovered multiple pieces of information that would be needed and there are a few alternatives regarding how and where to store them

Pseudo-code needs to have the actual logic similarly to the data structures, for example this is not very informative, the contents of verify_proof is what matters and that is the only thing that is actually missing:

fn verify_piece_inclusion_proof(
    piece: Piece,
    proof: PieceProof,
    segment: Segment,
) -> bool {
    segment.verify_proof(piece, proof)
}

specs/discussions/2025-04-segment-proofs.md

Comment on lines +27 to +39

+              ```rust
+              struct ChildSegmentDescription {
+              	// Shard if of the child shard the segment belongs to.
+              	shard_id: ShardId,
+              	// The root of the segment.
+              	segment_root: Hash,
+              	// Root of the previous segment created in the shard
+              	prev_segment_root: Hash
+              	// Local index of the segment (it may be redundant if we assume
+              	// that segments are always submitted in increasing order)
+              	local_index: u64,
+              }
+              ```

Owner

nazar-pc May 5, 2025

shard_id is implicitly clear from the rest of information sent from the child shard, I don't think it needs to be included here. local_index can also be tracked by the parent shard if necessary (it is always trivially +1 from the previous), though I'm not sure how it is helpful. Similarly it is not clear what is the purpose of prev_segment_root, there is nothing in the data structure to make sense of it, the data isn't tied together in any way.

It looks like only segment_root is truly needed here or do you have plans for other fields too?

specs/discussions/2025-04-segment-proofs.md

Comment on lines +41 to +45

+. The status of child shard segments is tracked indexed by their `local_index` through their
+                 `IndexStatus`. `IndexStatus` gives information about if the segment has been already submitted to
+                 the parent, is pending confirmation in the beacon chain, or it has been submitted to the global
+                 history of the beacon chain and has already been assigned its global segment index (pertaining
+                 its sequence in the global history).

Owner

nazar-pc May 5, 2025

Tracked where and how? I assume this is a client-side thing?

specs/discussions/2025-04-segment-proofs.md

Comment on lines +47 to +58

+              ```rust
+              let segment_index = (local_index: u64, global_index: IndexStatus<u64>)
+              enum IndexStatus {
+              	// The segment has been committed to the global history.
+              	Committed(u64),
+              	// The segment is pending to be committed to the global history.
+              	Pending,
+              	// The segment has not been submitted yet.
+              	NotSubmitted,
+              }
+              ```

Owner

nazar-pc May 5, 2025

SegmentIndex today is simply a u64. Why does it need to be a tuple of two values? If we're talking about global history it assumes every segment has a unique increasing number, not a tuple.

specs/discussions/2025-04-segment-proofs.md

Comment on lines +74 to +77

+. The parent chain pulls all the `ChildSegmentDescription` (or `SegmentHeader`) from the child
+                 segments propagated and after performing the corresponding light verification includes them on
+                 the `consensus_info` of their next block along with any new local segment created in the parent
+                 shard to propagate them all up to the beacon chain.

Owner

nazar-pc May 5, 2025

Let's maybe standardize on either using "chain" or "shard" here.

Also I have a suggestion on how to (subjectively) improve description of such things. For me it is easier to build a mental model of the thing that is described when I can find an anchor, a reference point. Here you're talking about the parent chain (so it is not an anchor, it is parent to something we'll mention later), then child segments (so not an anchor, because it is a child of something that will be mentioned later) and then parent shard (which I initially thought was not the same parent at the beginning of a sentence, but later realized it was). By "anchor" I mean a place where I would "stand" if I was to visualize what is happening.

So imagine there are layers of shards on top of each other:

               beacon chain
            /                \
      shard 1                 shard 2
     /       \               /       \
shard 11    shard 12    shard 21    shard 22

In the description given right now, I happen to "stand" somewhere between "shard 1" and "shard 11" for example, which is an awkward place, especially when "parent" is mentioned twice in the same sentence, meaning different things. Let's rewrite the sentence assuming "standing" on "shard 1" specifically:

A shard aggregates its own SegmentHeader along with any SegmentHeaders referenced by corresponding child shard block headers in consensus_info, so they are all propagated to the beacon chain.

I was trying to assume a reference point being "a shard" and count everything relatively to it (child shard and parent shard/beacon chain). Also we can reduce the verbosity a lot when there is a stable reference.

If we start with a SegmentHeader (for which we have already established when it is created and included in the history), we don't need to describe it again here, we don't need to describe where it is created either because it is implicitly "a shard" we just started a sentence with. So whenever we do that (and we know exactly when), if there happens to be block headers of child shards containing the same we aggregate it.

It also implies that there might be local segment header missing, but the logic we have described is still valid, we still aggregate them like before. Mentioning segment header implies we have verified its contents, whatever it happens to be (which is already described somewhere), so it doesn't need to be re-explained here.

At least this is the way I build a mental model about what is happening here.

specs/discussions/2025-04-segment-proofs.md Outdated Show resolved Hide resolved

specs/discussions/2025-04-segment-proofs.md

Comment on lines +135 to +137

+                 fn generate_piece_inclusion_proof(piece: Piece, segment: Segment) -> Option<PieceProof> {
+                 	 segment.generate_proof(piece)
+                 }

Owner

nazar-pc May 5, 2025

Step 1 in the previous section started with the piece that already contains the proof that it is a part of the segment.

segment.generate_proof(piece) kind of makes sense, but wrapping it in generate_piece_inclusion_proof doesn't really help with the description, it is an unnecessary (in this case) abstraction and is basically a tautology.

specs/discussions/2025-04-segment-proofs.md

+                 }
+                 ```
+. **Locate the corresponding super segment**:

Owner

nazar-pc May 5, 2025

Locating super segment is not an important or interesting part, I think it is safe to say that we will not actually be locating anything, the process of proof generation will be a reaction to the super segment creation, implying we already have it and it is certainly not missing in that case (so Option<> is't needed).

specs/discussions/2025-04-segment-proofs.md

Comment on lines +151 to +159

+. **Generate the super segment proof**:
+                 - Using the `SuperSegment`'s Merkle tree, generate a proof of inclusion for `segment4`.
+                 ```rust
+                 fn generate_super_segment_proof(segment: Segment, super_segment: SuperSegment) -> Option<SuperSegmentProof> {
+                 	 super_segment.generate_proof(segment)
+                 }
+                 ```

Owner

nazar-pc May 5, 2025

This is the part that is meant to be useful, but it is kind of not really, just a tautology without any meat.

Something like this would be more descriptive if we are doing pseudo-code:

let tree = MerkleTree::new(super_segment.components());
let super_segment_root = tree.root();
let segment_proof = tree.get_proof(segment_offset);

It indicates that we do turn a super segment into some set of "things" that we can build a Merkle Tree with. Then we create a proof that out RawSegment was there at segment_offset.

What we'll need to include in the piece to make it verifiable is:

segment_root
segment_offset
segment_proof
super_segment.num_segments (assuming segment roots are the only thing we create a Merkle Tree over)

These are the things both necessary and sufficient to securely verify inclusion against a single root hash. See Merkle Tree API:

abundance/crates/shared/ab-merkle-tree/src/unbalanced_hashed.rs

Lines 306 to 312 in 47c9bc1

    
           pub fn verify( 
        
               root: &[u8; OUT_LEN], 
        
               proof: &[[u8; OUT_LEN]], 
        
               leaf_index: usize, 
        
               leaf: [u8; OUT_LEN], 
        
               num_leaves: usize, 
        
           ) -> bool {

root -> super_segment_root
proof -> segment_proof
leaf_index -> segment_offset
leaf -> segment_root
num_leaves -> num_segments

Owner

nazar-pc May 5, 2025

After reading verification section below I think we also need to store mapping from segment index to super segment index somewhere somehow, so we can figure out which super segment root to use for piece verification. Previously with a single chain it was 1:1 mapping, but now piece index N can be in any super segment, it depends on how many segments were included in each.

specs/discussions/2025-04-segment-proofs.md

Comment on lines +161 to +180

+. **Combine the proofs**:
+                 - Package the piece inclusion proof and the super segment proof into a single proof structure.
+                 ```rust
+                 struct GlobalHistoryProof {
+                 	 piece_proof: PieceProof,
+                 	 super_segment_proof: SuperSegmentProof,
+                 }
+                 fn generate_global_history_proof(piece: Piece, segment: Segment, beacon_chain: BeaconChain) -> Option<GlobalHistoryProof> {
+                 	 let piece_proof = generate_piece_inclusion_proof(piece, segment)?;
+                 	 let super_segment = locate_super_segment(segment, beacon_chain)?;
+                 	 let super_segment_proof = generate_super_segment_proof(segment, super_segment)?;
+                 	 Some(GlobalHistoryProof {
+                 		  piece_proof,
+                 		  super_segment_proof,
+                 	 })
+                 }
+                 ```

Owner

nazar-pc May 5, 2025

I'd say this is less of a combination of proofs and more of inserting segment proof and segment offset into the pieces we have produced earlier that were basically unverifiable to complete them.

specs/discussions/2025-04-segment-proofs.md

Comment on lines +191 to +221

+. **Verify the piece inclusion proof**:
+                 - Use the Merkle root of `segment4` to validate the inclusion proof for `piece1`.
+                 ```rust
+                 fn verify_piece_inclusion_proof(piece: Piece, proof: PieceProof, segment: Segment) -> bool {
+                 	 segment.verify_proof(piece, proof)
+                 }
+                 ```
+. **Verify the super segment proof**:
+                 - Use the Merkle root of the `SuperSegment` to validate the inclusion proof for `segment4`.
+                 ```rust
+                 fn verify_super_segment_proof(segment: Segment, proof: SuperSegmentProof, super_segment: SuperSegment) -> bool {
+                 	 super_segment.verify_proof(segment, proof)
+                 }
+                 ```
+. **Combine the verification steps**:
+                 - Ensure both the piece inclusion proof and the super segment proof are valid.
+                 ```rust
+                 fn verify_global_history_proof(proof: GlobalHistoryProof, piece: Piece, segment: Segment, super_segment: SuperSegment) -> bool {
+                 	 let piece_valid = verify_piece_inclusion_proof(piece, proof.piece_proof, segment);
+                 	 let super_segment_valid = verify_super_segment_proof(segment, proof.super_segment_proof, super_segment);
+                 	 piece_valid && super_segment_valid
+                 }
+                 ```

Owner

nazar-pc May 5, 2025

Similarly this pseudo-code really says nothing about how things are verified, the contents of those methods matter!

We neither have segment nor super segment here. We only have a piece and super segment root, so the signature is still the same as here, just segment_root replaced with super_segment_root:

abundance/crates/shared/ab-core-primitives/src/pieces.rs

Lines 998 to 1013 in 47c9bc1

    
               /// Validate proof embedded within a piece produced by the archiver 
        
               pub fn is_valid(&self, segment_root: &SegmentRoot, position: u32) -> bool { 
        
                   let (record, &record_root, parity_chunks_root, record_proof) = self.split(); 
        
                   let source_record_merkle_tree_root = BalancedHashedMerkleTree::compute_root_only(record); 
        
                   let record_merkle_tree_root = BalancedHashedMerkleTree::compute_root_only(&[ 
        
                       source_record_merkle_tree_root, 
        
                       **parity_chunks_root, 
        
                   ]); 
        
                   if record_merkle_tree_root != *record_root { 
        
                       return false; 
        
                   } 
        
                   record_root.is_valid(segment_root, record_proof, position) 
        
               }

This also likely indicates we should include super segment index in the piece itself (segment index just like position could have been inferred from piece_index before, but not super segment index). That is unless we do something like "we only create super segment when we have N segments", which I don't think we want to do due to delays it'll cause, especially early in the history of the blockchain.

Alternatively we may store the information necessary to map segment index to super segment index, where we may also store the number of segments in a super segment too if the storage increase is not a concern for light clients, those two together will take something like 11-12 bytes of extra storage per super segment.

nazar-pc marked this pull request as draft

May 5, 2025 22:29


          Update specs/discussions/2025-04-segment-proofs.md

74f6b21

Co-authored-by: Nazar Mokrynskyi <[email protected]>

Collaborator Author

adlrocha commented May 30, 2025 •

edited

Loading

Closing this PR. We have made a lot of progress for the end-to-end of segment commitments to global history making this discussion already updated. This discussion has been superseded by #267 with a more up-to-date and cleaner description about this protocol mechanism.

adlrocha closed this

nazar-pc deleted the discussion/segment-proofs branch

May 31, 2025 00:22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet