diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 949e2b0bf..82750ef04 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -10,9 +10,21 @@ editors: - name: Marcin Rataj github: lidel url: https://lidel.org/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ - name: Henrique Dias github: hacdias url: https://hacdias.com/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ + - name: Hugo Valtier + github: Jorropo + url: https://jorropo.net/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ xref: - url - path-gateway @@ -183,6 +195,22 @@ returned: returned to the client, the HTTP status code has already been sent to the client. +### :dfn[skip-raw-blocks] (request query parameter) + +The optional `skip-raw-blocks` parameter is available only for CAR requests. + +It specifies whether blocks with the multicodec `raw` `0x55` MUST be present in +the CAR response. + +It accepts two values: +- `y`: Blocks with `raw` multicodec MUST NOT be returned. +- `n`, or missing (unspecified): no-op, no special handling of `raw` blocks. + +When not specified a gateway implementation MUST assume `n`. + +A Gateway MUST return HTTP error 400 Bad Request when `skip-raw-blocks=y` is +sent for a content path with a root CID with the `raw` multicodec. + # HTTP Response Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway]. @@ -212,10 +240,10 @@ The Body hash MUST match the Multihash from the requested CID. # CAR Responses (application/vnd.ipld.car) -A CAR stream for the requested -[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) -content type (with optional `order` and `dups` params), path and optional -`dag-scope` and `entity-bytes` URL parameters. +A CAR stream ([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) +with optional `order` and `dups` content type parameters) for the requested +content path (and optional `dag-scope`, `entity-bytes` and/or `skip-raw-blocks` +URL parameters). ## CAR version diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md new file mode 100644 index 000000000..07621ab65 --- /dev/null +++ b/src/ipips/ipip-0445.md @@ -0,0 +1,176 @@ +--- +title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses" +date: 2023-10-09 +ipip: open +editors: + - name: Hugo Valtier + github: Jorropo + url: https://jorropo.net/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ + - name: Marcin Rataj + github: lidel + url: https://lidel.org/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ +relatedIssues: + - https://github.com/ipfs/specs/issues/444 +order: 445 +tags: ['ipips'] +--- + +## Summary + +Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway]. + +## Motivation + +Allow clients to read a stream which only contain proofs in a bottom heavy +graph using `raw` codec for it's leaves. + +Usefull for UnixFS for features like webseeds +([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata +about a DAG is fetched from a trustless gateway, but the actual raw data can be +fetched from any source that supports either trustless gateway specification, +or plain HTTP Range Requests, allowing for trustless and verifiable data +retrieval from plain HTTP (non-IPFS) data sources. + +## Detailed design + +The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway] +allows clients to download an entity except blocks with the multicodec +`raw` (`0x55`). + +- When set to `y`, the parameter instructs the gateway not to transmit + blocks referenced with a CID with the `raw` multicodec. +- If set to `n`, or left unspecified, there is no special handling of `raw` + multicodec blocks (the existing default behavior remains the same). + +Importantly, unless explicitly specified as `y`, the default operational +mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`. + +## Design rationale + +### User Benefit + +Implementing the `skip-raw-blocks` parameter offers several benefits to users: + +1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received + files in their deserialized form without necessitating the transmission of + raw blocks from the gateway. + +2. **Incremental Download:** Clients can incrementally download files in + deserialized forms from non-IPFS servers. Allowing applications to share + distribution for IPFS and non-IPFS clients. + +3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled, + clients can quickly discover numerous candidate blocks without being + bottlenecked by the gateway's transmission of raw blocks. + +4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed + over HTTP in deserialized form can now act as sources for specific block + byte ranges, without having to support any IPFS specific APIs. Plain HTTP + Range Requests can be used for fetching remaining raw block data, and the + metadata read via `skip-raw-blocks=y` is enough for a client to verify the + remaining raw block byte ranges fetched from non-IPFS system match expected + CIDs. + +### Compatibility + +Setting the default value of the `skip-raw-blocks` parameter to `n` ensures +backward compatibility with existing clients and systems that are unaware +of this new flag. + +### Alternatives + +An alternative approach would be to request blocks individually. +However, it adds extra round trips and more per HTTP request overhead +and thus is undesirable. + +#### Why not `dag-scope=skip-raw-blocks` ? + +The existing `dag-scope` parameter determines the overall range of blocks to retrieve, +while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges. +Combining them under one parameter would restrict their combined utility. + +For example: +- A client is streaming a video from a webseed and the user seeks through the + video, then the client would send `dag-scope=entity&entity-bytes=42:1337` + with `skip-raw-blocks=y` to download the proofs for the required section of the + video, and then fetches remaining raw data byte ranges from a faster CDN. +- A client is verifying an OOB transferred directory in deserialized form, + then `dag-scope=all` with `skip-raw-blocks=y` makes sense. + +#### Why not CAR content type parameter ? + +CAR content type's +([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) +optional parameters like `order` and `dups` impact the way data is represented +when returned as a CAR stream, but does modify the scope of the data itself. +Does not add nor subtract data from the response. + +The scope of the data is controlled by URL content path and optional +`dag-scope`, `entity-bytes` URL parameters. This is where `skip-raw-blocks` +belongs. + +This is not just a matter of aesthetics: the URL path and query parameters +allow for caching of different subsets of a DAG in a way that is interoperable +with existing HTTP tools and clients, minimizes risk of caching incomplete DAG +response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being +in the URL query, we ensure CAR responses without `raw` blocks will be cached +under different key than full responses (just like already existing `dag-scope` +and `entity-bytes`). + +#### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks? + +Prevention of amplification attacks and efficient server operation. + +By utilizing the `raw` (`0x55`) codec servers can trivially determine whether +to fetch or skip a block without having to fetch it to learn any new +information. + +If we framed this feature around skipping all leaf nodes, that would require +server to fetch the leaves to learn if they have any child nodes. This would +force server to fetch data that is never returned to the client. + +Although `skip-raw-blocks` is more limited and not able to handle UnixFS files +chunked without `--raw-leaves` option, it allows both the client and server to +trivially verify a block must not be fetched. Preventing issues of +Amplification where a server could need to fetch multiple orders more data than +the client when executing the request. + +## Security + +This IPIP does not impact security model of trustless gateway. + +## Test fixtures + +:::issue + +TODO: update below section with CIDs or CARs from conformance tests + +Scenarios we should check: +- [ ] request for `/ipfs/cid` where CID has `raw` codec MUST return HTTP 400 (Bad Request) +- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with + `skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs +- [ ] create a new CAR fixture that only have non-raw blocks. Request it with + `skip-raw-blocks=y`, confirm the response includes expected CIDs and does not + include raw blocks referenced by parents. + - important part is creating CAR fixture by hand, and ensure the raw blocks are + NEVER announced anywhere (generate fixture with random data, add to ipfs + with raw-leaves option, then export DAG without `raw` blocks (use go-car's + [`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or + similar) + - Why? This goes extra mile, but ensures every conformant gateway + implementation is not doing useless work of fetching raw blocks which are + not required for fulfilling `skip-raw-blocks=y` requests). We did + similar thing for `entity-bytes` and it was the only way we could show + bugs in Saturn project's cache implementation at the time. + +::: + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).