Skip to content

Automatic decompression should sanitize Content-Encoding and Content-Length headers from the response #1729

Open
@lucacasonato

Description

@lucacasonato

What is the issue with the Fetch Standard?

The fetch() spec allows browsers to perform decompression of HTTP responses in fetch() if an appropriate content-encoding header is set on the response. In this case, the Response.prototype.body stream no longer reflects the raw bytes (modulo protocol framing) received on the wire, but instead a processed version of the bytes after being passed through a decompression routine.

This decompression is meant to be transparent to users: they do not have to explicitly opt in or enable it. Further, they can not even disable this (ref #1524).

Unfortunately, the decompression is currently not very transparent: given an arbitrary response object, it is ambiguous whether the Response's body has been decompressed or is still compressed.

This causes real world problems:

  • it poses a hazards when implementers add new encodings for automatic decompression, because a user that was previously manually decompressing a response with an unsupported content encoding, can now not tell whether they need to perform decompression or not after a browser adds native support for decompressing this content encoding
  • proxies can not tell what headers they need to send downstream (For compressed responses, omit Content-Length and Content-Encoding headers WinterTC55/fetch#23)

Proposal

I propose we strip out Content-Length (because it represents the content length prior to decompression), and Content-Encoding (because it represents the encoding prior to decompression) from Response headers when we perform automatic response body decompression in fetch(). I am not suggestion this affects responses created with new Response() or responses returned from fetch() that do not have automatic response body decompression performed.

Compatibility

I don't think this change will break any existing code. It may skew some folks' monitoring tools. I make this assumption based on the following thoughts:

  • The Content-Length before decompression is meaningless if you only have the decompressed body. You can not infer how long the real response is based on the Content-Length in both gzip and br.
  • The original Content-Encoding is not useful in combination with a compressed body. The only use I can think of is monitoring usecases where you want to determine what percentage of your assets were served with compression (and with which compression).

Prior art

In the JavaScript space:

  • Both Deno and Cloudflare implement this proposed fix to allow for the proxy use case mentioned above

In other programming languages:

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs implementer interestMoving the issue forward requires implementers to express interestneeds testsMoving the issue forward requires someone to write teststopic: http

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions