Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions SWIPs/swip-26.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
swip: 26
title: Standardised Chunk Type Framework
status: Draft
type: Standards Track
category: Core
author: mfw78 (@mfw78)
created: 2025-03-03
---

## Simple Summary
This SWIP introduces a standardised framework for defining chunk types in Swarm, improving security and interoperability through consistent type identification and validation.

## Abstract
This SWIP proposes a standardised framework for defining and processing chunk types in Swarm. By creating a formal type system for chunks, including content-addressed chunks (CAC) and single-owner chunks (SOC), we improve security, interoperability, and maintainability across the Swarm ecosystem. The proposal defines a structured approach to chunk identification, versioning, and validation. The key innovation is the formal definition of fixed-length type-specific headers to be delivered alongside chunks and formally documenting address determination and payload validation rules.

## Motivation
Swarm's storage layer is built around chunks as the fundamental unit of data. Currently, the system supports multiple chunk types, but lack standardised headers. This creates several issues:

1. **Ambiguous Processing**: Without explicit type information, chunk processing depends on implicit detection methods, leading to potential security vulnerabilities.

2. **Limited Extensibility**: Adding new chunk types requires changes to core validation logic, making it difficult to evolve the system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no new chunk type proposed at the moment.

Copy link
Author

@mfw78 mfw78 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replica chunks come to mind as a natural new chunk type, instead of slicing this into the SOC chunks. By not communicating the type of a chunk on the wire-level, this makes execution non-deterministic which leaves open potential issues surrounding resource starvation (e.g. a concerted actor can send chunks that are always SOC chunks, and, on the assumption that most chunks are CAC chunks, lead to a double-spike in resource consumption as all chunks must first go through the CAC check first.

Copy link
Member

@significance significance Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i agree with this (core validation logic) but yes standardising the chunk type types will make it easier moving forward and we should use SWIPs to define these


3. **Inconsistent Validation**: Chunk validation logic is spread across multiple components, leading to potential inconsistencies.

4. **Type-Safety Gaps**: Without formal type definitions, runtime type errors can occur when processing chunks.

A standardised chunk type framework would address these issues by providing a consistent, extensible system for defining, identifying, and validating different chunk types.

## Specification

### Core Concepts

#### 1. Chunk Structure

A standardised chunk shall conceptually consist of:

1. **Header**: Metadata describing the chunk and its contents
- Common Header: Information common to all chunk types (type, version)
- Type-Specific Header: Additional fields specific to the chunk type
2. **Payload**: The actual chunk data

The chunk's address is not part of the chunk itself but is deterministically derived from the chunk's contents based on its type.

#### 2. Common Chunk Header

The common chunk header shall contain:

1. **Type**: The chunk type identifier (1 byte)
2. **Version**: The chunk format version (1 byte)

| Type ID | Name | Description |
|---------|------|-------------|
| 0x00 | CAC | Content-addressed chunk |
| 0x01 | SOC | Single-owner chunk |
| 0x02-0xFF | Reserved | Reserved for future chunk types |

#### 3. Fixed-Length Type-Specific Headers

All type-specific headers MUST be of fixed length for their respective chunk types. This ensures that at a wire-level, the maximum size of a chunk is always known and predictable, based on the first 2 bytes (type and version).

Example header sizes:
- CAC header: 10 bytes (2 bytes common header + 8 bytes span)
- SOC header: 99 bytes (2 bytes common header + 32 bytes ID + 65 bytes signature)

### Address Calculation

The address of a chunk shall be deterministically calculated based on its type, version, and contents. We define the general address calculation function as:

$$\text{Address} = f_{\text{type}}(\text{header}, \text{payload})$$

Where $f_{\text{type}}$ is the type-specific address calculation function.

#### Generic Address Derivation Function

For any chunk type, the address derivation function can be formally defined as:

$$f_{\text{type}}(\text{header}, \text{payload}) = \mathcal{H}(g_{\text{type}}(\text{header}, \text{payload}))$$

Where:
- $\mathcal{H}$ is a cryptographic hash function (i.e. `keccak256`)
- $g_{\text{type}}$ is a type-specific data preparation function

Different chunk types will implement specific derivation functions based on their requirements.

### Chunk Type Specifications

The Swarm Specifications shall define the standardised format for each chunk type. Adding a new chunk type to the specifications requires:

1. Assignment of a unique type identifier
2. Definition of fixed-length type-specific header structure
3. Definition of payload structure
4. Specification of address calculation function $f_{\text{type}}$
5. Specification of validation requirements

These specifications ensure that all implementations handle chunks consistently and securely across the Swarm ecosystem.

### Type Processing

The chunk processing logic shall:

1. Receive the chunk type and version information from the wire protocol
2. Use the type and version to determine the expected fixed-length type-specific header size as defined in the Swarm Specifications
3. Verify that the received header matches the expected size for the given type
4. Fail fast if the header is malformed or incomplete
5. Extract the type-specific header fields
6. Calculate the chunk address using the type-specific address calculation function
7. Apply type-specific validation rules
8. Process the payload according to type-specific structure

This approach allows for early validation of chunk integrity based on protocol-level type information, reducing parsing errors and simplifying processing logic.

#### Flowchart

The flowchart below illustrates the processing steps for a chunk:

```mermaid
flowchart TD
Start[Receive chunk via wire protocol] --> A[Protobuf decodes chunk type, version, header, and payload]
A --> B{Header size matches expected size for type?}
B -->|No| C[Fail: Invalid header size]
B -->|Yes| D[Extract type-specific header fields]
D --> E[Calculate chunk address using type-specific function]
E --> F{Validate chunk content}
F -->|Invalid| G[Fail: Invalid chunk content]
F -->|Valid| H[Process payload according to type-specific structure]
H --> I[Pass processed chunk to appropriate protocol handler]
I --> End[Protocol-specific processing]
```

## Rationale

The proposed standardised chunk type framework addresses several key issues in the current implementation:

1. **Type Ambiguity**: By explicitly encoding chunk types in the header, we eliminate ambiguity in chunk processing, enhancing security and reliability.

2. **Extensibility**: The formal specifications allow for future chunk types to be added in a standardised way without modifying core validation logic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those new chunk types address calculation functions should be implemented in our smart contracts as well that still makes it difficult to integrate new types


3. **Validation Consistency**: Centralising validation rules in the specifications ensures consistent enforcement across components and implementations.

4. **Memory Efficiency**: Fixed-length headers enable predictable memory allocation and reduce fragmentation.

5. **Parsing Efficiency**: Type-specific parsing paths reduce the need for speculative parsing, improving performance.

The design choices prioritise:
- Security through explicit typing and validation
- Efficiency through predictable memory allocation and fail-fast validation
- Extensibility through the standardised specification system
- Backward compatibility with existing chunk types

## Backwards Compatibility

This proposal maintains backward compatibility by:

1. Preserving existing chunk address calculation methods for current chunk types
2. Supporting current chunk formats with version 0 of each type
3. Allowing for gradual adoption of the type system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me, first, it seems like it needs a breaking change in the base protocol to handle type headers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A localstore migration can be used in order to both assign type, and version numbers to chunks contained within the localstore, assigning version 0 to the respective chunk types.

4. Providing a conversion layer between legacy and new chunk formats

## Test Cases

Test cases should include:

1. **Header Validation**: Tests that verify correct parsing of type-specific headers for different chunk types
2. **Address Calculation**: Tests that confirm proper address derivation for each chunk type
3. **Size Verification**: Tests that ensure fixed-length headers meet their size requirements
4. **Malformed Input**: Tests that verify proper rejection of malformed chunks
5. **Version Handling**: Tests for correct processing of different versions of the same chunk type

## Implementation

Implementation will proceed in phases:

1. Formalise the chunk type specifications for CAC and SOC in the Swarm Specifications
2. Implement type-aware chunk processing in the node software
3. Add validation framework for existing chunk types based on the specifications
4. Develop compatibility layer for processing legacy chunks

## Security Considerations

The standardised chunk type framework improves security through:

1. **Explicit Type Checking**: Reduces the risk of type confusion attacks
2. **Fixed-Length Headers**: Prevents buffer overflow attacks
3. **Early Validation**: Enables fail-fast behaviour for malformed chunks
4. **Deterministic Addressing**: Ensures consistent and secure chunk addressing
5. **Versioned Security**: Allows security improvements via version updates

## Copyright Waiver

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).