Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions specs/discussions/2025-04-segment-proofs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
## Shard segments submission to the beacon chain

The goal of this discussion is to surface the core data structures and mechanics of the process of
committing segments to the global history in the beacon chain, and verifying that a piece of a
segment belongs to the global history in the beacon chain. shard to the upper levels of the
hierarchy.

### Submitting segments to the parent.

1. Child shards create new segments with 128 underlying pieces each with their
`record || record_root || parity_record_chunks_root || record_proof`. Each child shard creates
segments independently (in the same way that is currently implemented). Segments are assigned and
increasing sequence number, `local_index` that determines the order in which they were created in
created in the shard. These segments that haven't been committed to the global history of in the
beacon chain are `UnverifiedSegment`s and include their own `UnverifiedSegmentHeader` (which
matches the current `Segment` and `SegmentHeader` data structures, respectively).

> TODO: Point to the right parts of the code that implements this for reference.

2. As soon as a new segment has been created in a child shard, it is included in the next block and
submitted to the parent shard as part of the `consensus_info` inside the shard block. The segment
information that is propagated to the parent is the following:
> Note: If all this information is already available as part of the `SegmentHeader`, the raw
> header can be submitted instead of having to create this ad-hoc `ChildSegmentDescription` data
> structure.

```rust
struct ChildSegmentDescription {
// Shard if of the child shard the segment belongs to.
shard_id: ShardId,
// The root of the segment.
segment_root: Hash,
// Root of the previous segment created in the shard
prev_segment_root: Hash
// Local index of the segment (it may be redundant if we assume
// that segments are always submitted in increasing order)
local_index: u64,
}
```
Comment on lines +27 to +39
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard_id is implicitly clear from the rest of information sent from the child shard, I don't think it needs to be included here. local_index can also be tracked by the parent shard if necessary (it is always trivially +1 from the previous), though I'm not sure how it is helpful. Similarly it is not clear what is the purpose of prev_segment_root, there is nothing in the data structure to make sense of it, the data isn't tied together in any way.

It looks like only segment_root is truly needed here or do you have plans for other fields too?


3. The status of child shard segments is tracked indexed by their `local_index` through their
`IndexStatus`. `IndexStatus` gives information about if the segment has been already submitted to
the parent, is pending confirmation in the beacon chain, or it has been submitted to the global
history of the beacon chain and has already been assigned its global segment index (pertaining
its sequence in the global history).
Comment on lines +41 to +45
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked where and how? I assume this is a client-side thing?


```rust
let segment_index = (local_index: u64, global_index: IndexStatus<u64>)

enum IndexStatus {
// The segment has been committed to the global history.
Committed(u64),
// The segment is pending to be committed to the global history.
Pending,
// The segment has not been submitted yet.
NotSubmitted,
}
```
Comment on lines +47 to +58
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SegmentIndex today is simply a u64. Why does it need to be a tuple of two values? If we're talking about global history it assumes every segment has a unique increasing number, not a tuple.


4. When the child shard block including segments in its `consensus_info` field is submitted to the
parent chain, the `ChildSegmentDescription` (or `SegmentHeader`) for all segments included in the
block are lightly verified to see if they are consistent with the view of the history in the
parent for the child and they can be propagated further up to the beacon chain. The light
verification performed consists of:

- Checking that the `prev_segment_root` is equal to the `segment_root` of the previous segment
for the shard.
- That the `local_index` for the new segments is the subsequent one of the one for the previous
segment. As it will be described in future sections, through the data availability layer, nodes
in the system are periodically checking the correctness of segments being propagated to prevent
forged segments from being propagated to the beacon chain and requiring a system re-org to
clean-up forged segments.

5. The parent chain pulls all the `ChildSegmentDescription` (or `SegmentHeader`) from the child
segments propagated and after performing the corresponding light verification includes them on
the `consensus_info` of their next block along with any new local segment created in the parent
shard to propagate them all up to the beacon chain.
Comment on lines +74 to +77
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe standardize on either using "chain" or "shard" here.

Also I have a suggestion on how to (subjectively) improve description of such things. For me it is easier to build a mental model of the thing that is described when I can find an anchor, a reference point. Here you're talking about the parent chain (so it is not an anchor, it is parent to something we'll mention later), then child segments (so not an anchor, because it is a child of something that will be mentioned later) and then parent shard (which I initially thought was not the same parent at the beginning of a sentence, but later realized it was). By "anchor" I mean a place where I would "stand" if I was to visualize what is happening.

So imagine there are layers of shards on top of each other:

               beacon chain
            /                \
      shard 1                 shard 2
     /       \               /       \
shard 11    shard 12    shard 21    shard 22

In the description given right now, I happen to "stand" somewhere between "shard 1" and "shard 11" for example, which is an awkward place, especially when "parent" is mentioned twice in the same sentence, meaning different things. Let's rewrite the sentence assuming "standing" on "shard 1" specifically:

A shard aggregates its own SegmentHeader along with any SegmentHeaders referenced by corresponding child shard block headers in consensus_info, so they are all propagated to the beacon chain.

I was trying to assume a reference point being "a shard" and count everything relatively to it (child shard and parent shard/beacon chain). Also we can reduce the verbosity a lot when there is a stable reference.

If we start with a SegmentHeader (for which we have already established when it is created and included in the history), we don't need to describe it again here, we don't need to describe where it is created either because it is implicitly "a shard" we just started a sentence with. So whenever we do that (and we know exactly when), if there happens to be block headers of child shards containing the same we aggregate it.

It also implies that there might be local segment header missing, but the logic we have described is still valid, we still aggregate them like before. Mentioning segment header implies we have verified its contents, whatever it happens to be (which is already described somewhere), so it doesn't need to be re-explained here.

At least this is the way I build a mental model about what is happening here.


6. The submission of a segment to the beacon chain triggers the commitment of all the segments from
child shards into the global history, and the creation of a `SuperSegment`. `SuperSegments` also
include all segments from the beacon chain included in the block for which the `SuperSegment` is
being created. A chain of `SuperSegment`s is used to represent the global history of the system
in the beacon chain, and each of them be used for efficient verifications of the inclusion of
segments into the global history.
Comment on lines +79 to +84
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... are we not supposed to wait for segments to be "confirmed". Super segment will reorg with the beacon chain itself, so it is fine to include beacon chain's segment there right away, but the rest of segments can reorg independently and if we commit to them right way and reorg happens, we'll have to invalidate super segment, which sounds like something we'll want to avoid.


```rust
/// The global history of the system is represented as a map where for each block of the beacon chain
/// that includes a super segment, the corresponding super segment with information about the list of
/// segments committed is made available on-chain.
Comment on lines +87 to +89
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why do we need to make it a map and why the key needs to be a block number?

I remember we had a discussion about this, but I don't remember why it was necessary (it is not clear from the rest of this document to me what its purpose is).

type GlobalHistory = HashMap<BlockNumber, SuperSegment>;

struct SuperSegment {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like in the block header I expect its identity (block number) to come first, I expect super segment index to be the first field in its header too.

// The root of the super segment.
// It is computed by getting the Merkle root of the tree of segments aggregated in the super segment.
super_segment_root: Hash,
// Number of segments aggregated in this super segment.
num_segments: u64
// Index of the super segment in the global history of the system
// (e.g. if the previous segment had super_segment_index 0, and num_segments 4,
// this super segment will have super_segment_index 4).
super_segment_index: u64,
// Beacon height in which the super segment was created. This is useful to inspect the block
// for additional information about the transactions with segment creations
beacon_height: BlockNumber,
}
```

7. All child shards are following the beacon chain, and they are monitoring when a new super segment
is created that includes segments from their shards. When a new super segment is created, they
will update accordingly their `segment_index` map to point to the right segment index in the
global history (i.e. `IndexStatus::Committed(global_index)`). This needs to trigger an update in
the original segment created in the child shard to seal it as final (and part of the global
history).

8. Additionally, along with updating the global index of the `UnverifiedSegment` of the child shard,
the `UnverifiedSegment` is transformed into a `SealedSegment` which is the final form of a
segment that has been committed to the global history. The only difference between an
`UnverifiedSegment` and a `SealedSegment` is that the `SealedSegment` has been assigned a
`global_index` from the global history, and that it includes a
`global_history_proof: SuperSegmentProof` field that can be used to verify given the right super
segment that this child segment belongs to the global history (further details about the
generation and verification of these proofs is given in the sections below).
Comment on lines +108 to +122
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexStatus still seems like an odd thing to me.

Here is how I'd probably design it:

  • rename SegmentHeader to RawSegmentHeader (that is also how you called it above)
  • on each shard on the client we store both Vec<RawSegmentHeader> (these are pending) and Vec<SegmentHeader> (these are finished and used for shard sync from DSN like in current Subspace)
  • we pull the oldest RawSegmentHeader every time beacon chain incudes its segment root and append Vec<SegmentHeader> with a new entry:
    struct SegmentHeader {
        segment_index: u64,
        raw_segment: RawSegment,
    }

I'm not even sure we need a local shard index at all, it doesn't really seem to be used for anything and once segment is usable, we already have the global index attached to it. The only thing not clear to me yet is whether it will be a problem at all that segment_index != previous_segment_index + 1 for a specific shard. I don't see issues right now and all segments are chained via parent hash anyway, so I don't think we need local_segment_index.


### Generating proofs of segment inclusion

To generate a proof that a specific piece (e.g., `piece1` from `segment4` in `shard11`) is part of
the global history, follow these steps:
Comment on lines +126 to +127
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would build the description step by step. I don't think it is necessary to give an example of a piece1 because the segment inclusion proof is identical for all shards in a segment.

So I'd start with a segment inclusion proof in general, and then mention that it is included in all pieces (like a broadcast operation, clearly indicating that none of the pieces are special in this case).


1. **Generate the piece inclusion proof**:

- Retrieve `segment4` from `shard11`.
- Use the segment's Merkle tree to generate a proof of inclusion for `piece1`.

```rust
fn generate_piece_inclusion_proof(piece: Piece, segment: Segment) -> Option<PieceProof> {
segment.generate_proof(piece)
}
Comment on lines +135 to +137
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1 in the previous section started with the piece that already contains the proof that it is a part of the segment.


segment.generate_proof(piece) kind of makes sense, but wrapping it in generate_piece_inclusion_proof doesn't really help with the description, it is an unnecessary (in this case) abstraction and is basically a tautology.

```

2. **Locate the corresponding super segment**:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locating super segment is not an important or interesting part, I think it is safe to say that we will not actually be locating anything, the process of proof generation will be a reaction to the super segment creation, implying we already have it and it is certainly not missing in that case (so Option<> is't needed).


- Identify the beacon block that includes the `SuperSegment` containing `segment4`.
- Retrieve the `SuperSegment` and its Merkle tree.

```rust
fn locate_super_segment(segment: Segment, beacon_chain: BeaconChain) -> Option<SuperSegment> {
beacon_chain.find_super_segment(segment)
}
```

3. **Generate the super segment proof**:

- Using the `SuperSegment`'s Merkle tree, generate a proof of inclusion for `segment4`.

```rust
fn generate_super_segment_proof(segment: Segment, super_segment: SuperSegment) -> Option<SuperSegmentProof> {
super_segment.generate_proof(segment)
}
```
Comment on lines +151 to +159
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that is meant to be useful, but it is kind of not really, just a tautology without any meat.

Something like this would be more descriptive if we are doing pseudo-code:

let tree = MerkleTree::new(super_segment.components());
let super_segment_root = tree.root();
let segment_proof = tree.get_proof(segment_offset);

It indicates that we do turn a super segment into some set of "things" that we can build a Merkle Tree with. Then we create a proof that out RawSegment was there at segment_offset.

What we'll need to include in the piece to make it verifiable is:

  • segment_root
  • segment_offset
  • segment_proof
  • super_segment.num_segments (assuming segment roots are the only thing we create a Merkle Tree over)

These are the things both necessary and sufficient to securely verify inclusion against a single root hash. See Merkle Tree API:

pub fn verify(
root: &[u8; OUT_LEN],
proof: &[[u8; OUT_LEN]],
leaf_index: usize,
leaf: [u8; OUT_LEN],
num_leaves: usize,
) -> bool {

root -> super_segment_root
proof -> segment_proof
leaf_index -> segment_offset
leaf -> segment_root
num_leaves -> num_segments

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading verification section below I think we also need to store mapping from segment index to super segment index somewhere somehow, so we can figure out which super segment root to use for piece verification. Previously with a single chain it was 1:1 mapping, but now piece index N can be in any super segment, it depends on how many segments were included in each.


4. **Combine the proofs**:

- Package the piece inclusion proof and the super segment proof into a single proof structure.

```rust
struct GlobalHistoryProof {
piece_proof: PieceProof,
super_segment_proof: SuperSegmentProof,
}

fn generate_global_history_proof(piece: Piece, segment: Segment, beacon_chain: BeaconChain) -> Option<GlobalHistoryProof> {
let piece_proof = generate_piece_inclusion_proof(piece, segment)?;
let super_segment = locate_super_segment(segment, beacon_chain)?;
let super_segment_proof = generate_super_segment_proof(segment, super_segment)?;
Some(GlobalHistoryProof {
piece_proof,
super_segment_proof,
})
}
```
Comment on lines +161 to +180
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is less of a combination of proofs and more of inserting segment proof and segment offset into the pieces we have produced earlier that were basically unverifiable to complete them.


This process ensures that the proof of inclusion for a piece in the global history is generated by
combining cryptographic proofs from both the segment and the super segment. The resulting proof can
be used to verify the inclusion of the piece in the global history.

### Verifying segment proofs

To verify that a specific piece (e.g., `piece1` from `segment4` in `shard11`) is part of the global
history, follow these steps:

1. **Verify the piece inclusion proof**:

- Use the Merkle root of `segment4` to validate the inclusion proof for `piece1`.

```rust
fn verify_piece_inclusion_proof(piece: Piece, proof: PieceProof, segment: Segment) -> bool {
segment.verify_proof(piece, proof)
}
```

2. **Verify the super segment proof**:

- Use the Merkle root of the `SuperSegment` to validate the inclusion proof for `segment4`.

```rust
fn verify_super_segment_proof(segment: Segment, proof: SuperSegmentProof, super_segment: SuperSegment) -> bool {
super_segment.verify_proof(segment, proof)
}
```

3. **Combine the verification steps**:

- Ensure both the piece inclusion proof and the super segment proof are valid.

```rust
fn verify_global_history_proof(proof: GlobalHistoryProof, piece: Piece, segment: Segment, super_segment: SuperSegment) -> bool {
let piece_valid = verify_piece_inclusion_proof(piece, proof.piece_proof, segment);
let super_segment_valid = verify_super_segment_proof(segment, proof.super_segment_proof, super_segment);
piece_valid && super_segment_valid
}
```
Comment on lines +191 to +221
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly this pseudo-code really says nothing about how things are verified, the contents of those methods matter!

We neither have segment nor super segment here. We only have a piece and super segment root, so the signature is still the same as here, just segment_root replaced with super_segment_root:

/// Validate proof embedded within a piece produced by the archiver
pub fn is_valid(&self, segment_root: &SegmentRoot, position: u32) -> bool {
let (record, &record_root, parity_chunks_root, record_proof) = self.split();
let source_record_merkle_tree_root = BalancedHashedMerkleTree::compute_root_only(record);
let record_merkle_tree_root = BalancedHashedMerkleTree::compute_root_only(&[
source_record_merkle_tree_root,
**parity_chunks_root,
]);
if record_merkle_tree_root != *record_root {
return false;
}
record_root.is_valid(segment_root, record_proof, position)
}

This also likely indicates we should include super segment index in the piece itself (segment index just like position could have been inferred from piece_index before, but not super segment index). That is unless we do something like "we only create super segment when we have N segments", which I don't think we want to do due to delays it'll cause, especially early in the history of the blockchain.

Alternatively we may store the information necessary to map segment index to super segment index, where we may also store the number of segments in a super segment too if the storage increase is not a concern for light clients, those two together will take something like 11-12 bytes of extra storage per super segment.


This verification process ensures that the provided proofs are cryptographically valid and that the
piece is indeed part of the global history.

## Genesis segment info

- Genesis segments are created as it is currently implemented for Subspace. With the difference that
it is already created as a `SealedSegment` in the beacon chain.

> TODO: @nazar-pc, do we need additional information and implementation details for this?