Description
What is the issue with the Fetch Standard?
The fetch()
spec allows browsers to perform decompression of HTTP responses in fetch()
if an appropriate content-encoding
header is set on the response. In this case, the Response.prototype.body
stream no longer reflects the raw bytes (modulo protocol framing) received on the wire, but instead a processed version of the bytes after being passed through a decompression routine.
This decompression is meant to be transparent to users: they do not have to explicitly opt in or enable it. Further, they can not even disable this (ref #1524).
Unfortunately, the decompression is currently not very transparent: given an arbitrary response object, it is ambiguous whether the Response
's body has been decompressed or is still compressed.
This causes real world problems:
- it poses a hazards when implementers add new encodings for automatic decompression, because a user that was previously manually decompressing a response with an unsupported content encoding, can now not tell whether they need to perform decompression or not after a browser adds native support for decompressing this content encoding
- proxies can not tell what headers they need to send downstream (For compressed responses, omit
Content-Length
andContent-Encoding
headers WinterTC55/fetch#23)
Proposal
I propose we strip out Content-Length
(because it represents the content length prior to decompression), and Content-Encoding
(because it represents the encoding prior to decompression) from Response
headers when we perform automatic response body decompression in fetch()
. I am not suggestion this affects responses created with new Response()
or responses returned from fetch()
that do not have automatic response body decompression performed.
Compatibility
I don't think this change will break any existing code. It may skew some folks' monitoring tools. I make this assumption based on the following thoughts:
- The
Content-Length
before decompression is meaningless if you only have the decompressed body. You can not infer how long the real response is based on theContent-Length
in both gzip and br. - The original
Content-Encoding
is not useful in combination with a compressed body. The only use I can think of is monitoring usecases where you want to determine what percentage of your assets were served with compression (and with which compression).
Prior art
In the JavaScript space:
- Both Deno and Cloudflare implement this proposed fix to allow for the proxy use case mentioned above
In other programming languages:
- Go's
http
std lib module has auto decompression enabled by default. It strips outContent-Length
andContent-Encoding
when it performs decompression. It has a flag on the response to determine if auto-decompression has taken place. See https://pkg.go.dev/net/http#Response.Uncompressed - Rust's
reqwest
crate supports auto decompression and enables it by default for clients if thegzip
orbrotli
compile time flags are set. It strips outContent-Length
andContent-Encoding
when it performs decompression. It has no flag to check if decompression has been performed or not. See https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.gzip - Python's
requests
: does auto decompression by default, and setsContent-Length
to the post decompression content length. It does not remove theContent-Encoding
header - Ruby's
Net::HTTP
: does auto compression by default, removingContent-Encoding
, and rewriting
Content-Length
to the length after decompression.