- 
                Notifications
    You must be signed in to change notification settings 
- Fork 405
Description
Synapse takes forever to send large responses. It takes us longer to send just the response than it does for us to process and encode_json_response in some cases.
Examples
98s to process the request and encode_json_response but the request isn't finished sending until 484s (8 minutes) which is 6.5 minutes of dead-time. The response size is 36MB
Jaeger trace: 4238bdbadd9f3077.json
 
59s to process and finishes after 199s. The response size is 36MB
Jaeger trace: 2149cc5e59306446.json
 
I've come across this before and it's not a new thing. For example in #13620 (original issue), I described as the "mystery gap at the end after we encode the JSON response (encode_json_response)" but never encountered it being this egregious.
It can also happen for small requests. 2s to process and finishes after 5s. The response size is 120KB
 
Investigation
@kegsay pointed out _write_bytes_to_request which runs after encode_json_response and has comments like "Write until there's backpressure telling us to stop." that definitely hint at some areas of interest.
synapse/synapse/http/server.py
Lines 869 to 873 in 03937a1
| with start_active_span("encode_json_response"): | |
| span = active_span() | |
| json_str = await defer_to_thread(request.reactor, encode, span) | |
| _write_bytes_to_request(request, json_str) | 
The JSON serialization is done in a background thread because it can block the reactor for many seconds. This part seems normal and fast (no problem).
But we also use _ByteProducer to send the bytes down to the client. Using a producer ensures we can send down all of the bytes to the client without hitting a 60s timeout (see context in comments below)
synapse/synapse/http/server.py
Lines 883 to 889 in d40bc27
| # The problem with dumping all of the response into the `Request` object at | |
| # once (via `Request.write`) is that doing so starts the timeout for the | |
| # next request to be received: so if it takes longer than 60s to stream back | |
| # the response to the client, the client never gets it. | |
| # | |
| # The correct solution is to use a Producer; then the timeout is only | |
| # started once all of the content is sent over the TCP connection. | 
This logic was added in:
- Iteratively encode JSON responses matrix-org/synapse#8013
- Switch the JSON byte producer from a pull to a push producer matrix-org/synapse#8116
Some extra time is expected as we're working with the reactor instead of blocking it but it seems like something isn't tuned optimally (chunk size, starting/stopping too much, etc)