Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.59.0 Sends subgraph requests without Content-Length header #6503

Open
joemccall86 opened this issue Jan 2, 2025 · 7 comments · May be fixed by #6538
Open

v1.59.0 Sends subgraph requests without Content-Length header #6503

joemccall86 opened this issue Jan 2, 2025 · 7 comments · May be fixed by #6538
Assignees

Comments

@joemccall86
Copy link

joemccall86 commented Jan 2, 2025

Describe the bug

When upgrading from router v0.58.1 to v0.59.0, one of our subgraphs in Ruby could not parse the request. I enabled request/response logging in router and discovered a difference in the request behavior:

v1.58.1:

Request body to subgraph "jarvis" http.request.body=Body(Full(b"{\"query\":\"query insurancePlanInfoV2__jarvis__0{me{name}}\",\"operationName\":\"insurancePlanInfoV2__jarvis__0\"}")) apollo.subgraph.name=jarvis

v1.59.0:

Request body to subgraph "jarvis" http.request.body=Body(Streaming) apollo.subgraph.name=jarvis

Our subgraph doesn't support this "Streaming" mode. We do not have compression enabled for any of our subgraphs either. The only difference is the version of router.

To Reproduce

Steps to reproduce the behavior:

  1. Set up a subgraph that only responds to HTTP/1.1 non-streaming requests
  2. Submit the request
  3. The subgraph will respond with a 422, missing the field "query".

Expected behavior

The subgraph still receives the full body of the request

Output

If applicable, add output to help explain your problem.

Desktop (please complete the following information):

  • OS: macOS
  • Version 14.7.2

Additional context

Add any other context about the problem here.

@smyrick
Copy link
Member

smyrick commented Jan 2, 2025

@joemccall86 Thanks for raising the issue. Just to confirm, are you actually using Router v1.58.1 and upgrading to v1.59.0? There is no version 0.59.0

@joemccall86
Copy link
Author

joemccall86 commented Jan 2, 2025

Yes, sorry about that. Fixed the title and description.

@joemccall86 joemccall86 changed the title 0.59.0 Sends subgraph requests using Streaming instead of Full 1.59.0 Sends subgraph requests using Streaming instead of Full Jan 2, 2025
@goto-bus-stop
Copy link
Member

goto-bus-stop commented Jan 3, 2025

AFAIK the Full() vs Streaming difference should only be a difference in the in-memory representation used inside the router, not in the bytes sent over the network. (A "Full" body is also streamed over the network in the end.) So it's not necessarily the stream that is a culprit but it does indicate that the router is doing something different.

I don't recall anything in the changes we made in 1.59.0 that would cause changes to the subgraph requests. When you say the subgraph "could not parse" the request, do you have any more specifics? Are we looking at garbled non-JSON bytes, missing/wrong headers, anything like that, something else?

@joemccall86
Copy link
Author

I did some troubleshooting by looking at the subgraph request content:

v1.58.1

traefik-1  | [HTTP] 2025/01/03 18:47:17 172.18.0.5:56018 POST /graphql: 200 OK HTTP/2.0
traefik-1  |
traefik-1  | Request Headers:
traefik-1  | Accept: application/json, application/graphql-response+json
traefik-1  | Accept-Encoding: gzip, br, deflate
traefik-1  | Apollographql-Client-Name: apollo-explorer
traefik-1  | Authorization: Bearer <redacted>
traefik-1  | Content-Length: 105
traefik-1  | Content-Type: application/json
traefik-1  | Origin: https://studio.apollographql.com
traefik-1  | User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
traefik-1  | X-Forwarded-Host: proxy:443
traefik-1  | X-Forwarded-Port: 443
traefik-1  | X-Forwarded-Proto: https
traefik-1  | X-Forwarded-Server: 0ec44f12802b
traefik-1  | X-Real-Ip: 172.18.0.5
traefik-1  | X-Replaced-Path: /app-upstream/graphql
traefik-1  |
traefik-1  | Request Body:
traefik-1  | {"query":"query insurancePlanInfoV2__jarvis__0{me{id}}","operationName":"insurancePlanInfoV2__jarvis__0"}
traefik-1  |
traefik-1  | Response Headers:
traefik-1  | Access-Control-Allow-Origin: *
traefik-1  | Access-Control-Expose-Headers: *
traefik-1  | Cache-Control: max-age=0, private, must-revalidate
traefik-1  | Content-Security-Policy: default-src 'self' 'unsafe-inline' *.uat.grandrounds.com *.uat.includedhealth.com www.google.com *.googleapis.com googleads.g.doubleclick.net *.stripe.com *.dicomgrid.com *.mxpnl.com api.amplitude.com api.rollbar.com cdnjs.cloudflare.com *.sendbird.com api2.amplitude.com/batch wss://*.sendbird.com staging.grandrounds.com d2rqoiy120x5nc.cloudfront.net grhc--uat.sandbox.my.salesforce.com; child-src 'self' *.dicomgrid.com js.stripe.com www.google.com *.youtube.com *.doubleclick.net *.grandrounds.com *.includedhealth.com *.vimeo.com *.qualtrics.com; font-src *; img-src * data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.googleapis.com *.uat.grandrounds.com *.uat.includedhealth.com cdn.amplitude.com *.stripe.com cdnjs.cloudflare.com *.google.com d2rqoiy120x5nc.cloudfront.net
traefik-1  | Content-Type: application/json; charset=utf-8
traefik-1  | Date: Fri, 03 Jan 2025 18:47:18 GMT
traefik-1  | Etag: W/"39e3febbbc871092e25b04bb6b018ea1"
traefik-1  | Server: istio-envoy
traefik-1  | Set-Cookie: ahoy_visitor=bac7c3ee-1014-4bc4-8a7f-f77ae099fcb0; path=/; expires=Sun, 03 Jan 2027 18:47:18 GMT; secure; HttpOnly; SameSite=None,ahoy_visit=5a4790da-2a1e-4a15-9f18-b438b31943e0; path=/; expires=Fri, 03 Jan 2025 22:47:18 GMT; secure; HttpOnly; SameSite=None,ahoy_track=true; path=/; expires=Sun, 02 Feb 2025 18:47:18 GMT; secure; HttpOnly; SameSite=None
traefik-1  | Strict-Transport-Security: max-age=631138519
traefik-1  | Vary: Origin
traefik-1  | X-Content-Type-Options: nosniff
traefik-1  | X-Download-Options: noopen
traefik-1  | X-Envoy-Upstream-Service-Time: 170
traefik-1  | X-Frame-Options: SAMEORIGIN
traefik-1  | X-Meta-Request-Version: 0.6.0
traefik-1  | X-Permitted-Cross-Domain-Policies: none
traefik-1  | X-Request-Id: 7cd5cdc1-5329-96cf-a6a3-143a25f07dee
traefik-1  | X-Runtime: 0.158937
traefik-1  | X-Xss-Protection: 1; mode=block
traefik-1  |
traefik-1  | Response Content Length: 32
traefik-1  |
traefik-1  | Duration: 454.851 ms
traefik-1  |
traefik-1  | Response Body:
traefik-1  | {"data":{"me":{"id":"1198999"}}}
traefik-1  |

v1.59.0:

traefik-1  | [HTTP] 2025/01/03 18:45:52 172.18.0.4:56294 POST /graphql: 200 OK HTTP/2.0
traefik-1  |
traefik-1  | Request Headers:
traefik-1  | Accept: application/json, application/graphql-response+json
traefik-1  | Accept-Encoding: gzip, br, deflate
traefik-1  | Apollographql-Client-Name: apollo-explorer
traefik-1  | Authorization: Bearer <redacted>
traefik-1  | Content-Type: application/json
traefik-1  | Origin: https://studio.apollographql.com
traefik-1  | User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
traefik-1  | X-Forwarded-Host: proxy:443
traefik-1  | X-Forwarded-Port: 443
traefik-1  | X-Forwarded-Proto: https
traefik-1  | X-Forwarded-Server: 0ec44f12802b
traefik-1  | X-Real-Ip: 172.18.0.4
traefik-1  | X-Replaced-Path: /app-upstream/graphql
traefik-1  |
traefik-1  | Request Body:
traefik-1  | {"query":"query insurancePlanInfoV2__jarvis__0 { me { id } }","operationName":"insurancePlanInfoV2__jarvis__0"}
traefik-1  |
traefik-1  | Response Headers:
traefik-1  | Access-Control-Allow-Origin: *
traefik-1  | Access-Control-Expose-Headers: *
traefik-1  | Cache-Control: max-age=0, private, must-revalidate
traefik-1  | Content-Security-Policy: default-src 'self' 'unsafe-inline' *.uat.grandrounds.com *.uat.includedhealth.com www.google.com *.googleapis.com googleads.g.doubleclick.net *.stripe.com *.dicomgrid.com *.mxpnl.com api.amplitude.com api.rollbar.com cdnjs.cloudflare.com *.sendbird.com api2.amplitude.com/batch wss://*.sendbird.com staging.grandrounds.com d2rqoiy120x5nc.cloudfront.net grhc--uat.sandbox.my.salesforce.com; child-src 'self' *.dicomgrid.com js.stripe.com www.google.com *.youtube.com *.doubleclick.net *.grandrounds.com *.includedhealth.com *.vimeo.com *.qualtrics.com; font-src *; img-src * data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.googleapis.com *.uat.grandrounds.com *.uat.includedhealth.com cdn.amplitude.com *.stripe.com cdnjs.cloudflare.com *.google.com d2rqoiy120x5nc.cloudfront.net
traefik-1  | Content-Type: application/json; charset=utf-8
traefik-1  | Date: Fri, 03 Jan 2025 18:45:53 GMT
traefik-1  | Etag: W/"39e3febbbc871092e25b04bb6b018ea1"
traefik-1  | Server: istio-envoy
traefik-1  | Set-Cookie: ahoy_visitor=5df43dfe-42d1-4eb8-8088-6769ee334563; path=/; expires=Sun, 03 Jan 2027 18:45:53 GMT; secure; HttpOnly; SameSite=None,ahoy_visit=cc71e71e-5562-445e-9330-e9ec4b8d797b; path=/; expires=Fri, 03 Jan 2025 22:45:53 GMT; secure; HttpOnly; SameSite=None,ahoy_track=true; path=/; expires=Sun, 02 Feb 2025 18:45:53 GMT; secure; HttpOnly; SameSite=None
traefik-1  | Strict-Transport-Security: max-age=631138519
traefik-1  | Vary: Origin
traefik-1  | X-Content-Type-Options: nosniff
traefik-1  | X-Download-Options: noopen
traefik-1  | X-Envoy-Upstream-Service-Time: 105
traefik-1  | X-Frame-Options: SAMEORIGIN
traefik-1  | X-Meta-Request-Version: 0.6.0
traefik-1  | X-Permitted-Cross-Domain-Policies: none
traefik-1  | X-Request-Id: a5b79c2e-f582-9841-8f9c-0fe3132a1e55
traefik-1  | X-Runtime: 0.099038
traefik-1  | X-Xss-Protection: 1; mode=block
traefik-1  |
traefik-1  | Response Content Length: 32
traefik-1  |
traefik-1  | Duration: 176.099 ms
traefik-1  |
traefik-1  | Response Body:
traefik-1  | {"data":{"me":{"id":"1198999"}}}
traefik-1  |

A couple things I noticed:

  • Printing out the subgraph requests actually worked around the issue. I suspect this is due to the debugging code loading the entire request into memory so it can be printed before it is sent to the subgraph. I observed this both in the traefik proxy and when I printed the values via rhai script. This led me to my second discovery.
  • apollo-router v1.59.0 did not set the Content-Length header.

I suspect that our affected subgraph (which is a legacy monolith) is having a hard time processing requests without Content-Length.

@joemccall86 joemccall86 changed the title 1.59.0 Sends subgraph requests using Streaming instead of Full v1.59.0 Sends subgraph requests without Content-Length header Jan 3, 2025
@goto-bus-stop
Copy link
Member

Great, thanks for that investigation! So it is indeed related to the Full/Streaming difference. The Content-Length header is trivial to populate with a Full body (which has the entire body available beforehand as a simple buffer), but not with a streaming body as we don't process an entire streaming body before sending the headers to the subgraph. I filed a ticket internally so we can figure out what we want to do about this.

@JamesSlocumIH
Copy link

I filed a ticket internally

Hi! Is there a link to the ticket or some how we can track the progress of that ticket? Thanks!

@nmoutschen nmoutschen self-assigned this Jan 13, 2025
@nmoutschen
Copy link
Contributor

Hey @joemccall86!

Thanks a lot for filling this isuse and provide detailed logs! This helped a lot when writing a fix for this.

I've started work on this PR which should fix the issue you're experiencing. We'll just do some extra validation on our side before it's good to go in the next version of the router.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants