Skip to content

Commit f494ecd

Browse files
committed
docs: ipip-512 for 128-byte identity cid limit
documents the 128-byte limit for identity cids in unixfs contexts, with rationale from community discussions and test fixtures
1 parent 3dd66f5 commit f494ecd

File tree

2 files changed

+158
-0
lines changed

2 files changed

+158
-0
lines changed

src/ipips/ipip-0512.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: "IPIP-0512: Limit Identity CID Size to 128 Bytes in UnixFS Contexts"
3+
date: 2025-01-09
4+
ipip: proposal
5+
editors:
6+
- name: Marcin Rataj
7+
github: lidel
8+
affiliation:
9+
name: Interplanetary Shipyard
10+
url: https://ipshipyard.com/
11+
relatedIssues:
12+
- https://github.com/ipfs/boxo/pull/1018
13+
- https://github.com/multiformats/cid/issues/21
14+
- https://github.com/multiformats/multihash/issues/130
15+
thanks:
16+
- name: Rod Vagg
17+
github: rvagg
18+
- name: Volker Mische
19+
github: vmx
20+
- name: Alex Potsides
21+
github: achingbrain
22+
affiliation:
23+
name: Interplanetary Shipyard
24+
url: https://ipshipyard.com/
25+
order: 512
26+
tags: ['ipips']
27+
---
28+
29+
## Summary
30+
31+
This IPIP establishes a 128-byte maximum digest size limit for identity CIDs (multihash code `0x00`) in UnixFS contexts to prevent abuse and clarify appropriate usage boundaries.
32+
33+
## Motivation
34+
35+
Identity CIDs are unique in that they inline data directly into the CID itself rather than hashing it. Without clear limits, this creates several problems:
36+
37+
1. **Resource Exhaustion**: Poorly written clients could encode large payloads as identity CIDs and propagate them through the network, consuming bandwidth and resources without providing value.
38+
39+
2. **Security Vulnerabilities**: Identity CIDs provide no integrity verification and are vulnerable to bit flips. Large identity CIDs amplify this risk.
40+
41+
3. **Unclear Boundaries**: The ecosystem lacks clear guidelines on when identity CIDs are appropriate, leading to potential misuse.
42+
43+
4. **CIDs as Data Containers**: Without limits, identity CIDs could embed arbitrary amounts of data, effectively turning CIDs from content addresses into data containers.
44+
45+
As discussed in [ipfs/boxo#1018](https://github.com/ipfs/boxo/pull/1018), the community consensus is that large identity CIDs are problematic and a reasonable limit is needed.
46+
47+
## Detailed design
48+
49+
This IPIP adds a new section to the UnixFS specification documenting the 128-byte digest size limit for identity CIDs:
50+
51+
### Changes to UnixFS Specification
52+
53+
Add new section "Identity CID Size Limit" that specifies:
54+
55+
- Identity CIDs (multihash code `0x00`) are experimental and limited to 128-byte digest size
56+
- Implementations MUST never produce identity CIDs exceeding 128 bytes
57+
- Implementations MUST reject identity CIDs exceeding 128 bytes when reading
58+
- Implementations SHOULD automatically convert to regular blocks if data modifications would exceed the limit
59+
60+
### Test Fixtures
61+
62+
Add invalid test case for a 129-byte identity CID that implementations MUST reject.
63+
64+
## Design rationale
65+
66+
The 128-byte limit was chosen based on several factors:
67+
68+
1. **Alignment with Existing Constraints**: The limit matches `DefaultMaxDigestSize` already used for cryptographic hashes in the ecosystem. 128 bytes is a sensible limit that accommodates the digest sizes of the longest popular hash functions (e.g., SHA-512 produces 64-byte digests), while preventing unbounded growth.
69+
70+
2. **Community Consensus**: Key maintainers expressed support for this limit:
71+
- [@rvagg](https://github.com/ipfs/boxo/pull/1018#issuecomment-3240647923): "128 seems reasonable to me. I'm happy to have them squished down their happy-path use to a size where they're more likely being used for their size-saving utility"
72+
- [@vmx](https://github.com/ipfs/boxo/pull/1018#issuecomment-3241779136): "I'm not a fan of large identity CIDs... 128 bytes sound reasonable to me"
73+
- [@achingbrain](https://github.com/ipfs/boxo/pull/1018#discussion_r2318132492): "It looks fine at first glance 👍" (confirming Helia compatibility)
74+
75+
3. **Practical Usage**: 128 bytes is sufficient for legitimate use cases (small inline data) while preventing abuse.
76+
77+
4. **Implementation Precedent**: This limit has been implemented and tested in [ipfs/boxo#1018](https://github.com/ipfs/boxo/pull/1018) and included in Kubo 0.38 RC1 for broader testing.
78+
79+
### User benefit
80+
81+
- **Protection from Resource Exhaustion**: Users are protected from malicious or poorly-written clients that might otherwise propagate large identity CIDs.
82+
- **Clear Guidelines**: Developers have explicit boundaries for appropriate identity CID usage.
83+
- **Consistent Behavior**: All conforming implementations will handle identity CIDs consistently.
84+
- **No Wasted Resources**: Avoids unnecessary roundtrips where clients send data to remote services only to have the deserialized bytes sent back, when the client already had the data and could have avoided the entire network operation.
85+
86+
### Compatibility
87+
88+
Identity CIDs have always been marked as experimental, and this change does not impact users who used default settings in software like Kubo or Helia, which never produced identity CIDs by default.
89+
90+
This is a breaking change only for any existing identity CIDs with digest sizes exceeding 128 bytes. However:
91+
92+
- Existing valid identity CIDs (≤128 bytes) remain unaffected
93+
- The change has been tested in Kubo 0.38 RC1 to gather feedback
94+
- Most users are unaffected as identity CIDs require explicit opt-in
95+
96+
Implementations upgrading to support this IPIP will need to:
97+
1. Add validation to reject oversized identity CIDs when reading
98+
2. Prevent creation of identity CIDs exceeding the limit
99+
3. Consider automatic conversion to regular blocks when data grows
100+
101+
### Security
102+
103+
This change improves security by:
104+
105+
1. **Preventing Unbounded Resource Consumption**: Limits the amount of data that can be inlined in CIDs
106+
2. **Reducing Attack Surface**: Smaller identity CIDs reduce the impact of bit flip vulnerabilities
107+
3. **Clear Security Boundaries**: Explicit limits help security audits and threat modeling
108+
4. **Mitigating Known Vulnerabilities**: The go-car library previously had a vulnerability ([GHSA-9x4h-8wgm-8xfg](https://github.com/ipld/go-car/security/advisories/GHSA-9x4h-8wgm-8xfg)) where decoding user-controlled identity CIDs could cause excessive memory allocation, leading to denial of service. While go-car mitigated this by capping allocations at 1MiB, establishing a 128-byte limit at the UnixFS specification level ensures all implementations are protected from this class of vulnerabilities by default.
109+
110+
### Alternatives
111+
112+
Several alternatives were considered:
113+
114+
1. **No Limit**: Rejected due to resource exhaustion and abuse potential
115+
2. **Smaller Limit (32-64 bytes)**: Would break more existing use cases
116+
3. **Larger Limit (256+ bytes)**: As noted by @rvagg, "the higher you go, the harder it is to justify their use"
117+
4. **Complete Deprecation**: Too disruptive; identity CIDs have legitimate uses for tiny data
118+
119+
## Test fixtures
120+
121+
### Valid Identity CID (128 bytes)
122+
123+
- CID: `bafkqbaabijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbee`
124+
- Content: 128 'B' characters
125+
- Expected: Implementations MUST accept this CID
126+
127+
### Invalid Identity CID (129 bytes)
128+
129+
- CID: `bafkqbaibifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqi`
130+
- Content: 129 'A' characters
131+
- Expected: Implementations MUST reject this CID with an appropriate error
132+
133+
### Copyright
134+
135+
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

src/unixfs.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -976,6 +976,28 @@ Common empty structures that implementations frequently encounter:
976976
977977
These CIDs appear frequently in UnixFS implementations and are often hardcoded for performance optimization.
978978
979+
### Identity CID Size Limit
980+
981+
:::warning
982+
Identity CIDs (using multihash code `0x00`) are experimental and subject to strict size limitations.
983+
:::
984+
985+
Identity CIDs embed data directly in the CID rather than referencing external blocks. While useful for very small data that benefits from inline storage, in UnixFS contexts they are limited to prevent misuse:
986+
987+
- **Maximum digest size**: 128 bytes
988+
- **Purpose**: Small inline data only, not general-purpose data containers
989+
990+
Implementations:
991+
- **MUST** never produce identity CIDs with digest sizes exceeding 128 bytes
992+
- **MUST** reject identity CIDs with digest sizes exceeding 128 bytes when reading
993+
- **SHOULD** automatically convert identity CIDs to regular blocks if data modifications would push the digest size over the 128-byte limit
994+
995+
This limit ensures identity CIDs remain an optimization for tiny data rather than a way to embed arbitrary amounts of data directly in CIDs.
996+
997+
**Examples:**
998+
- Valid (128 bytes): `bafkqbaabijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbeeqscijbee` - Implementations MUST accept and convert to 128 'B' characters
999+
- Invalid (129 bytes): `bafkqbaibifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqi` - Implementations MUST reject
1000+
9791001
### Symbolic Links
9801002
9811003
- Fixture: [`symlink.car`](https://github.com/ipfs/gateway-conformance/raw/refs/tags/v0.8.1/fixtures/path_gateway_unixfs/symlink.car)
@@ -1047,6 +1069,7 @@ These validate that implementations properly reject malformed or non-UnixFS dag-
10471069
- 💢 [`bafybeiahfgovhod2uvww72vwdgatl5r6qkoeegg7at2bghiokupfphqcku.dag-pb`](https://github.com/ipld/codec-fixtures/raw/381e762b85862b2bbdb6ef2ba140b3c505e31a44/fixtures/dagpb_simple_forms_2/bafybeiahfgovhod2uvww72vwdgatl5r6qkoeegg7at2bghiokupfphqcku.dag-pb) - Simple form variant 2, bytes: `120b0a0901550005000102030412100a09015500050001020304120362617212100a090155000500010203041203666f6f` (no UnixFS metadata)
10481070
- 💢 [`bafybeidrg2f6slbv4yzydqtgmsi2vzojajnt7iufcreynfpxndca4z5twm.dag-pb`](https://github.com/ipld/codec-fixtures/raw/381e762b85862b2bbdb6ef2ba140b3c505e31a44/fixtures/dagpb_simple_forms_3/bafybeidrg2f6slbv4yzydqtgmsi2vzojajnt7iufcreynfpxndca4z5twm.dag-pb) - Simple form variant 3, bytes: `120b0a09015500050001020304120e0a09015500050001020304120161120e0a09015500050001020304120161` (no UnixFS metadata)
10491071
- 💢 [`bafybeieube7zxmzoc5bgttub2aqofi6xdzimv5munkjseeqccn36a6v6j4.dag-pb`](https://github.com/ipld/codec-fixtures/raw/381e762b85862b2bbdb6ef2ba140b3c505e31a44/fixtures/dagpb_simple_forms_4/bafybeieube7zxmzoc5bgttub2aqofi6xdzimv5munkjseeqccn36a6v6j4.dag-pb) - Simple form variant 4, bytes: `120e0a09015500050001020304120161120e0a09015500050001020304120161` (no UnixFS metadata)
1072+
- 💢 `bafkqbaibifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqkbifaucqi` - Identity CID with 129-byte digest (exceeds [128-byte limit](#identity-cid-size-limit) for identity CIDs). Content: 129 'A' characters. Implementations MUST reject this CID as the digest exceeds the maximum allowed size for identity multihashes.
10501073
10511074
## Additional Testing Resources
10521075

0 commit comments

Comments
 (0)