Fix endless 5xx responses leading to pages #5392
Open
+4
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The root cause here is that rails was returning
responses to the reverse proxy where the total
size of all the headers was greater than 4k. This
would lead to an
upstream sent too big header while reading response header from upstream
error in nginx, resulting in a default 502 bad
gateway response.
It took us dozens of hours of searching to find
the root cause because our nginx log parsing was
throwing away all of the error logs, and I only
saw it when directly tailing the logs from the
container in k8s. I have since fixed the log pipeline so the errors will show up as errors, with the error message.
As to how the rails app was returning so many
header bytes:
The
Redirector
middleware generates a 301whenever it sees a host that does not match a
predefined list. Notably,
api.rubygems.org
points to prod, but is not in that list.
The 301 includes a
Location
header with thecorrect host, while maintaining the rest of the
path and query string. This means that the
Location
header could contain up to the nginx limit of almost
8k bytes. In combination with the other headers returned
by the rails app, this was sufficient to exceed the
default limit of 4k bytes.
Verified that this fixes the 502s on staging by manually applying.
Tested via
curl http://localhost:8080/versions/$(printf 'HelloWorld%.0s' {1..395})/abcdef -H 'X-Forwarded-Proto: https' -H 'X-Edge-Proto: https' -H 'Host: indexs.rubygems.org' -H 'Accept: sam/test' -v
Signed-off-by: Samuel Giddins [email protected]