From c1e121eb0c569246c13f877f587c20f5039aaed4 Mon Sep 17 00:00:00 2001
From: Jorropo <jorropo.pgm@gmail.com>
Date: Wed, 11 Oct 2023 06:18:01 +0200
Subject: [PATCH 1/4] ipip(0445): add skip-leaves

---
 src/http-gateways/trustless-gateway.md |  22 +++++-
 src/ipips/ipip-0445.md                 | 105 +++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 src/ipips/ipip-0445.md

diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md
index 949e2b0bf..cf7528743 100644
--- a/src/http-gateways/trustless-gateway.md
+++ b/src/http-gateways/trustless-gateway.md
@@ -214,7 +214,7 @@ The Body hash MUST match the Multihash from the requested CID.
 
 A CAR stream for the requested
 [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
-content type (with optional `order` and `dups` params), path and optional
+content type (with optional `order`, `dups` and `skip-leaves` params), path and optional
 `dag-scope` and `entity-bytes` URL parameters.
 
 ## CAR version
@@ -301,6 +301,26 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as
 the raw data is already present in the parent block that links to the identity
 CID.
 
+## CAR `skip-leaves` (content type parameter)
+
+The `skip-leaves` parameter specifies whether blocks with the multicodec `raw`
+`0x55` must be sent.
+
+It accepts two values:
+- `y`: Blocks with `raw` multicodec MUST NOT be sent.
+- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent.
+
+A gateway MUST NOT assume this field is `y` if unspecified.
+When not specified it always MUST be understood as `n`.
+
+:::note Notes for implementers
+
+A request which is rooted at a `raw` block and has `skip-leaves=y` does not
+make sense and SHOULD NOT be sent by clients, it is fair for servers to
+error in this situation.
+
+:::
+
 ## CAR format parameters and determinism
 
 The default header and block order in a CAR format is not specified by IPLD specifications.
diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md
new file mode 100644
index 000000000..9e57d0306
--- /dev/null
+++ b/src/ipips/ipip-0445.md
@@ -0,0 +1,105 @@
+---
+title: "IPIP-0445: trustless gateway skip-leaves option"
+date: 2023-10-09
+ipip: open
+editors:
+  - name: Hugo VALTIER
+    github: Jorropo
+    url: https://jorropo.net/
+    affiliation:
+        name: Protocol Labs
+        url: https://protocol.ai/
+relatedIssues:
+  - https://github.com/ipfs/specs/issues/444
+order: 445
+tags: ['ipips']
+---
+
+## Summary
+
+Introduce `skip-leaves` flag for the :cite[trustless-gateway].
+
+## Motivation
+
+Allow clients to read a stream which only contain proofs in a bottom heavy
+graph using `raw` codec for it's leaves.
+
+Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444).
+
+## Detailed design
+
+The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway]
+allows clients to download an entity except blocks with the multicodec
+`raw` (`0x55`).
+
+- When set to `y`, the parameter instructs the gateway not to transmit
+  blocks tagged with the `raw` multicodec.
+- If set to `n`, or left unspecified, the gateway MUST transmit `raw`
+  multicodec blocks.
+
+Importantly, unless explicitly specified as `y`, the default operational
+mode of the gateway MUST assume the value of `skip-leaves` to be `n`.
+
+## Design rationale
+
+### User Benefit
+
+Implementing the `skip-leaves` parameter offers several benefits to users:
+
+1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received
+   files in their deserialized form without necessitating the transmission of
+   raw blocks from the gateway.
+2. **Incremental Download:** Clients can incrementally download files in
+   deserialized forms from non-IPFS servers. Allowing applications to share
+   distribution for IPFS and non IPFS clients.
+3. **Efficient Block Discovery:** With the `skip-leaves` option enabled,
+   clients can quickly discover numerous candidate blocks without being
+   bottlenecked by the gateway's transmission of raw blocks.
+
+### Compatibility
+
+Setting the default value of the `skip-leaves` parameter to `n` ensures
+backward compatibility with existing clients and systems that are unaware
+of this new flag.
+
+### Prevention of Amplification Attacks and Efficient Server Operation
+
+By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
+to fetch or skip a block without having to learn any new information.
+Although more limited and not able to handle unixfs file using dag-pb for their
+leaves, it allows both the client and server to trivially verify a block
+must not be fetched. Preventing issues of Amplification where a server could
+need to fetch multiple orders more data than the client when executing the
+request.
+
+### Why not `dag-scope=skip-leaves` ?
+
+The `dag-scope` parameter determines the overall range of blocks to retrieve,
+while `skip-leaves` selectively filters specific blocks within that range.
+Combining them under one parameter would restrict their combined utility.
+
+For example:
+- A client is streaming a video from a webseed and the user seeked through the
+  video, then the client would send `dag-scope=entity&entity-bytes=42:1337`
+  with `skip-leaves=y` to download the proofs for the required section of the
+  video.
+- A client is verifying an OOB transfered directory in deserialized form,
+  then `dag-scope=all` with  `skip-leaves=y` makes sense.
+
+### Alternatives
+
+An alternative approach would be to request blocks individually.
+However it adds extra round trips and more per HTTP request overhead
+and thus is undesireable.
+
+## Security
+
+None.
+
+## Test fixtures
+
+TODO
+
+### Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

From 131b29d2c39c0f210d1f4449d9d416793350538c Mon Sep 17 00:00:00 2001
From: Marcin Rataj <lidel@lidel.org>
Date: Wed, 25 Oct 2023 20:04:00 +0200
Subject: [PATCH 2/4] ipip-445: rename to skip-raw-blocks URL param

+ basic editorials
---
 src/http-gateways/trustless-gateway.md |  50 ++++-----
 src/ipips/ipip-0445.md                 | 140 ++++++++++++++++++-------
 2 files changed, 131 insertions(+), 59 deletions(-)

diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md
index cf7528743..2fcc9d895 100644
--- a/src/http-gateways/trustless-gateway.md
+++ b/src/http-gateways/trustless-gateway.md
@@ -183,6 +183,28 @@ returned:
     returned to the client, the HTTP status code has already been sent to the
     client.
 
+### :dfn[skip-raw-blocks] (request query parameter)
+
+The optional `skip-raw-blocks` parameter is available only for CAR requests.
+
+It specifies whether blocks with the multicodec `raw` `0x55` MUST be present in
+the CAR response.
+
+It accepts two values:
+- `y`: Blocks with `raw` multicodec MUST NOT be returned.
+- `n`, or missing (unspecified): no-op, no special handling of `raw` blocks.
+
+When not specified a gateway implementation MUST assume `n`.
+
+:::note Notes for implementers
+
+A `skip-raw-blocks=y` request for a content path with `raw` root CID does not
+make sense and SHOULD NOT be sent by clients.
+
+A Gateway SHOULD return HTTP error 400 Bad Request
+
+:::
+
 # HTTP Response
 
 Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].
@@ -212,10 +234,10 @@ The Body hash MUST match the Multihash from the requested CID.
 
 # CAR Responses (application/vnd.ipld.car)
 
-A CAR stream for the requested
-[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
-content type (with optional `order`, `dups` and `skip-leaves` params), path and optional
-`dag-scope` and `entity-bytes` URL parameters.
+A CAR stream ([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
+with optional `order` and `dups` content type parameters) for the requested
+content path (and optional `dag-scope`, `entity-bytes` and/or `skip-raw-blocks`
+URL parameters).
 
 ## CAR version
 
@@ -301,26 +323,6 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as
 the raw data is already present in the parent block that links to the identity
 CID.
 
-## CAR `skip-leaves` (content type parameter)
-
-The `skip-leaves` parameter specifies whether blocks with the multicodec `raw`
-`0x55` must be sent.
-
-It accepts two values:
-- `y`: Blocks with `raw` multicodec MUST NOT be sent.
-- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent.
-
-A gateway MUST NOT assume this field is `y` if unspecified.
-When not specified it always MUST be understood as `n`.
-
-:::note Notes for implementers
-
-A request which is rooted at a `raw` block and has `skip-leaves=y` does not
-make sense and SHOULD NOT be sent by clients, it is fair for servers to
-error in this situation.
-
-:::
-
 ## CAR format parameters and determinism
 
 The default header and block order in a CAR format is not specified by IPLD specifications.
diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md
index 9e57d0306..e1414eefb 100644
--- a/src/ipips/ipip-0445.md
+++ b/src/ipips/ipip-0445.md
@@ -1,14 +1,20 @@
 ---
-title: "IPIP-0445: trustless gateway skip-leaves option"
+title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses"
 date: 2023-10-09
 ipip: open
 editors:
-  - name: Hugo VALTIER
+  - name: Hugo Valtier
     github: Jorropo
     url: https://jorropo.net/
     affiliation:
         name: Protocol Labs
         url: https://protocol.ai/
+  - name: Marcin Rataj
+    github: lidel
+    url: https://lidel.org/
+    affiliation:
+        name: Protocol Labs
+        url: https://protocol.ai/
 relatedIssues:
   - https://github.com/ipfs/specs/issues/444
 order: 445
@@ -17,88 +23,152 @@ tags: ['ipips']
 
 ## Summary
 
-Introduce `skip-leaves` flag for the :cite[trustless-gateway].
+Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway].
 
 ## Motivation
 
 Allow clients to read a stream which only contain proofs in a bottom heavy
 graph using `raw` codec for it's leaves.
 
-Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444).
+Usefull for UnixFS for features like webseeds
+([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata
+about a DAG is fetched from a trustless gateway, but the actual raw data can be
+fetched from any source that supports either trustless gateway specification,
+or plain HTTP Range Requests, allowing for trustless and verifiable data
+retrieval from plain HTTP (non-IPFS) data sources.
 
 ## Detailed design
 
-The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway]
+The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway]
 allows clients to download an entity except blocks with the multicodec
 `raw` (`0x55`).
 
 - When set to `y`, the parameter instructs the gateway not to transmit
-  blocks tagged with the `raw` multicodec.
-- If set to `n`, or left unspecified, the gateway MUST transmit `raw`
-  multicodec blocks.
+  blocks referenced with a CID with the `raw` multicodec.
+- If set to `n`, or left unspecified, there is no special handling of `raw`
+  multicodec blocks (the existing default behavior remains the same).
 
 Importantly, unless explicitly specified as `y`, the default operational
-mode of the gateway MUST assume the value of `skip-leaves` to be `n`.
+mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`.
 
 ## Design rationale
 
 ### User Benefit
 
-Implementing the `skip-leaves` parameter offers several benefits to users:
+Implementing the `skip-raw-blocks` parameter offers several benefits to users:
 
 1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received
    files in their deserialized form without necessitating the transmission of
    raw blocks from the gateway.
+
 2. **Incremental Download:** Clients can incrementally download files in
    deserialized forms from non-IPFS servers. Allowing applications to share
-   distribution for IPFS and non IPFS clients.
-3. **Efficient Block Discovery:** With the `skip-leaves` option enabled,
+   distribution for IPFS and non-IPFS clients.
+
+3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled,
    clients can quickly discover numerous candidate blocks without being
    bottlenecked by the gateway's transmission of raw blocks.
 
+4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed
+   over HTTP in deserialized form can now act as sources for specific block
+   byte ranges, without having to support any IPFS specific APIs. Plain HTTP
+   Range Requests can be used for fetching remaining raw block data, and the
+   metadata read via `skip-raw-blocks=y` is enough for a client to verify the
+   remaining raw block byte ranges fetched from non-IPFS system match expected
+   CIDs.
+
 ### Compatibility
 
-Setting the default value of the `skip-leaves` parameter to `n` ensures
+Setting the default value of the `skip-raw-blocks` parameter to `n` ensures
 backward compatibility with existing clients and systems that are unaware
 of this new flag.
 
-### Prevention of Amplification Attacks and Efficient Server Operation
+### Alternatives
 
-By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
-to fetch or skip a block without having to learn any new information.
-Although more limited and not able to handle unixfs file using dag-pb for their
-leaves, it allows both the client and server to trivially verify a block
-must not be fetched. Preventing issues of Amplification where a server could
-need to fetch multiple orders more data than the client when executing the
-request.
+An alternative approach would be to request blocks individually.
+However, it adds extra round trips and more per HTTP request overhead
+and thus is undesirable.
 
-### Why not `dag-scope=skip-leaves` ?
+#### Why not `dag-scope=skip-raw-blocks` ?
 
-The `dag-scope` parameter determines the overall range of blocks to retrieve,
-while `skip-leaves` selectively filters specific blocks within that range.
+The existing `dag-scope` parameter determines the overall range of blocks to retrieve,
+while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges.
 Combining them under one parameter would restrict their combined utility.
 
 For example:
-- A client is streaming a video from a webseed and the user seeked through the
+- A client is streaming a video from a webseed and the user seeks through the
   video, then the client would send `dag-scope=entity&entity-bytes=42:1337`
-  with `skip-leaves=y` to download the proofs for the required section of the
-  video.
-- A client is verifying an OOB transfered directory in deserialized form,
-  then `dag-scope=all` with  `skip-leaves=y` makes sense.
+  with `skip-raw-blocks=y` to download the proofs for the required section of the
+  video, and then fetches remaining raw data byte ranges from a faster CDN.
+- A client is verifying an OOB transferred directory in deserialized form,
+  then `dag-scope=all` with `skip-raw-blocks=y` makes sense.
 
-### Alternatives
+#### Why not CAR content type parameter ?
 
-An alternative approach would be to request blocks individually.
-However it adds extra round trips and more per HTTP request overhead
-and thus is undesireable.
+CAR content type's
+([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
+optional parameters like `order` and `dups` impact the way data is represented
+when returned as a CAR stream, but does modify the scope of the data itself.
+Does not add nor subtract data from the response.
+
+The scope of the data is controlled by URL content path and optional
+`dag-scope`, `entity-bytes` URL parameters.  This is where `skip-raw-blocks`
+belongs.
+
+This is not just a matter of aesthetics: the URL path and query parameters
+allow for caching of different subsets of a DAG in a way that is interoperable
+with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
+response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being
+in the URL query, we ensure CAR responses without `raw` blocks will be cached
+under different key than full responses (just like already existing `dag-scope`
+and `entity-bytes`).
+
+#### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?
+
+Prevention of amplification attacks and efficient server operation.
+
+By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
+to fetch or skip a block without having to fetch it to learn any new
+information.
+
+If we framed this feature around skipping all leaf nodes, that would require
+server to fetch the leaves to learn if they have any child nodes. This would
+force server to fetch data that is never returned to the client.
+
+Although `skip-raw-blocks` is more limited and not able to handle UnixFS files
+chunked without `--raw-leaves` option, it allows both the client and server to
+trivially verify a block must not be fetched. Preventing issues of
+Amplification where a server could need to fetch multiple orders more data than
+the client when executing the request.
 
 ## Security
 
-None.
+This IPIP does not impact security model of trustless gateway.
 
 ## Test fixtures
 
-TODO
+:::issue
+
+TODO: update below section with CIDs or CARs from conformance tests
+
+Scenarios we should check:
+- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
+  `skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs
+- [ ] create a new CAR fixture that only have non-raw blocks. Request it with
+  `skip-raw-blocks=y`, confirm the response includes expected CIDs and does not
+  include raw blocks referenced by parents.
+  - important part is creating CAR fixture by hand, and ensure the raw blocks are
+    NEVER announced anywhere (generate fixture with random data, add to ipfs
+    with raw-leaves option, then export DAG without `raw` blocks (use go-car's
+    [`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or
+    similar)
+    - Why? This goes extra mile, but ensures every conformant gateway
+      implementation is not doing useless work of fetching raw blocks which are
+      not required for fulfilling `skip-raw-blocks=y` requests). We did
+      similar thing for `entity-bytes` and it was the only way we could show
+      bugs in Saturn project's cache implementation at the time.
+
+:::
 
 ### Copyright
 

From f96a92a5262fba7c9f788192aaea3f37ab0d8f06 Mon Sep 17 00:00:00 2001
From: Marcin Rataj <lidel@lidel.org>
Date: Wed, 25 Oct 2023 20:20:52 +0200
Subject: [PATCH 3/4] ipip-445: HTTP 400 on raw root cid

Ref.
https://github.com/ipfs/specs/pull/445#discussion_r1357342245
---
 src/http-gateways/trustless-gateway.md | 10 ++--------
 src/ipips/ipip-0445.md                 |  1 +
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md
index 2fcc9d895..6ae33f2ad 100644
--- a/src/http-gateways/trustless-gateway.md
+++ b/src/http-gateways/trustless-gateway.md
@@ -196,14 +196,8 @@ It accepts two values:
 
 When not specified a gateway implementation MUST assume `n`.
 
-:::note Notes for implementers
-
-A `skip-raw-blocks=y` request for a content path with `raw` root CID does not
-make sense and SHOULD NOT be sent by clients.
-
-A Gateway SHOULD return HTTP error 400 Bad Request
-
-:::
+A Gateway MUST return HTTP error 400 Bad Request when `skip-raw-blocks=y` is
+sent for a content path with a root CID with the `raw` multicodec.
 
 # HTTP Response
 
diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md
index e1414eefb..07621ab65 100644
--- a/src/ipips/ipip-0445.md
+++ b/src/ipips/ipip-0445.md
@@ -152,6 +152,7 @@ This IPIP does not impact security model of trustless gateway.
 TODO: update below section with CIDs or CARs from conformance tests
 
 Scenarios we should check:
+- [ ] request for `/ipfs/cid` where CID has `raw` codec MUST return HTTP 400 (Bad Request)
 - [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
   `skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs
 - [ ] create a new CAR fixture that only have non-raw blocks. Request it with

From ceb8b1d6fd7eee2ec04dd60978847aa20f861571 Mon Sep 17 00:00:00 2001
From: Marcin Rataj <lidel@lidel.org>
Date: Thu, 9 Nov 2023 04:23:14 +0100
Subject: [PATCH 4/4] chore: update editors

---
 src/http-gateways/trustless-gateway.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md
index 6ae33f2ad..82750ef04 100644
--- a/src/http-gateways/trustless-gateway.md
+++ b/src/http-gateways/trustless-gateway.md
@@ -10,9 +10,21 @@ editors:
   - name: Marcin Rataj
     github: lidel
     url: https://lidel.org/
+    affiliation:
+        name: Protocol Labs
+        url: https://protocol.ai/
   - name: Henrique Dias
     github: hacdias
     url: https://hacdias.com/
+    affiliation:
+        name: Protocol Labs
+        url: https://protocol.ai/
+  - name: Hugo Valtier
+    github: Jorropo
+    url: https://jorropo.net/
+    affiliation:
+        name: Protocol Labs
+        url: https://protocol.ai/
 xref:
   - url
   - path-gateway