DRIVERS-2884 Avoid connection churn when operations timeout #1675

prestonvasquez · 2024-10-14T21:06:45Z

This PR implements the design for connection pooling improvements described in DRIVERS-2884, based on the CSOT (Client-Side Operation Timeout) spec. It addresses connection churn caused by network timeouts during operations, especially in environments with low client-side timeouts and high latency.

When a connection is checked out after a network timeout, the driver now attempts to resume and complete reading any pending server response (instead of closing and discarding the connection). This may require multiple checkouts.
Each pending response read is subject to a cumulative 3-second static timeout. The timeout is refreshed after each successful read, acknowledging that progress is being made. If no data is read and the timeout is exceeded, the connection is closed.

To reduce unnecessary latency, if the timeout has expired while the connection was idle in the pool, a non-blocking single-byte read is performed; if no data is available, the connection is closed immediately.
This update introduces new CMAP events and logging messages (PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed) to improve observability of this path.

Please complete the following before merging:

Update changelog.
Make sure there are generated JSON files from the YAML test files.
Test changes in at least one language driver. Go: GODRIVER-3173 Complete pending reads on conn checkout mongo-go-driver#1977
Test these changes against all server versions and topologies (including standalone, replica set, sharded
clusters, and serverless).

source/client-side-operations-timeout/tests/connection-churn.yml

ShaneHarvey · 2025-04-17T22:23:24Z

source/client-side-operations-timeout/tests/connection-churn.yml

+    # after maxTimeMS, whereas mongod returns it after 
+    # max(blockTimeMS, maxTimeMS).  Until this ticket is resolved, these tests 
+    # will not pass on sharded clusters.
+    topologies: ["standalone", "replicaset"]


standalone -> single

ShaneHarvey · 2025-04-17T23:36:46Z

source/client-side-operations-timeout/tests/connection-churn.yml

+      - name: findOne
+        object: *collection
+        arguments:
+          timeoutMS: 50


In python this timeout is too small and causes this find to fail before sending anything to the server. The same problem exists in the other tests too. Perhaps all of theses tests should run a setup command (eg ping) to ensure a connection is created and available in the pool, then run the finds. What do you think?

codeowners-service-app · 2025-04-25T21:49:22Z

Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.

alcaeus

I'd like to wait until #1792 is merged to review the schema changes. From what I can see in the UTF specification, those changes look good.

alcaeus · 2025-04-30T12:29:44Z

source/unified-test-format/unified-test-format.md

@@ -3555,6 +3579,8 @@ other specs *and* collating spec changes developed in parallel or during the sam

 ## Changelog

+- 2025-04-25: **Schema version 1.24**


Can you please add what was changed here?

baileympearson · 2025-05-01T20:11:42Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

@@ -576,7 +576,106 @@ other threads from checking out [Connections](#connection) while establishing a
 Before a given [Connection](#connection) is returned from checkOut, it must be marked as "in use", and the pool's


I cannot comment on this in the diff. Line 236 in this file outlines a few states that a connection can be in. I think it makes sense to add the pending read state to this field.

Related to this - pending is already used to indicate a connection that has been created but not established. Can we choose a clear name for this new state? pending read seems like a clearer name

Another comment out-of-diff: should we allow drivers with background threads to close timed-out pending connections with the background thread? Or attempt the non-blocking read? This is a micro optimization but it in the scenario where a connection might have timed out and needs to be closed, the background thread could close the connection instead of a checkout request.

Can we choose a clear name for this new state?

Absolutely, great point! "pending response" seems apt.

Should we allow drivers with background threads to close timed-out pending connections with the background thread?

IIUC this would require polling connections and then performing aliveness checks. This would not be more performant than awaiting a pending read on check-out.

Or attempt the non-blocking read?

From the design document:

The major flaw in this approach is that when an application runs two operations consecutively, it’s possible for 2 connections to be created in the same pool. For example:

# Connect directly so we can assume only 1 pool. client = MongoClient(directConnection=True, timeoutMS=500) try: client.t.t.insert_one({}) except PyMongoError as exc: if exc.timeout: print(f'Operation timed out: {exc}') else: raise # With background reads, this operation could require a 2nd connection. client.t.t.insert_one({})

In the above code, there should only be one connection in the pool at any given time because the operations are not run concurrently. A foreground reading approach guarantees this constraint. However, using background reads could result in two connections being opened. It is unacceptable for this code to open two pooled connections in this case because connections are expensive and a limited resource. Worse, the extra connection(s) will remain open forever by default as maxIdleTimeMS defaults to unlimited. Imagine a customer with 1000 such app servers, after this change they could end up using 1000 extra connections. At best this decreases performance and at worst it can cause a connection storm.

Another consideration is that even if the background approach is implemented, the foreground solution still needs to be implemented as well. So the background read approach adds additional implementation complexity.

baileympearson · 2025-05-01T20:16:09Z

source/connection-monitoring-and-pooling/tests/README.md

+
+#### Connection Aliveness Check Fails
+
+1. Initialize a mock TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to


Thoughts on adding a mock server to drivers-evergreen-tools for these tests? I could go either way - there are only two, so the burden on drivers isn't too great but it might be nice if drivers didn't need to worry about the mock server logic themselves.

I’m concerned that this solution will require drivers to spin up a server when trying to test locally. I’ve suggested DRIVERS-3183 to support raw-TCP connection test entities which will allow us to convert these prose tests to a unified spec test in the future.

baileympearson · 2025-05-02T15:12:23Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

@@ -576,7 +576,106 @@ other threads from checking out [Connections](#connection) while establishing a
 Before a given [Connection](#connection) is returned from checkOut, it must be marked as "in use", and the pool's
 availableConnectionCount MUST be decremented.

-```text
+If an operation times out the socket while awaiting a server response and CSOT is enabled and `maxTimeMS` was added to


Suggest: purely organizational, maybe a header would be nice here to help visually separate this section from the other checkIn information?

Suggested change

If an operation times out the socket while awaiting a server response and CSOT is enabled and `maxTimeMS` was added to

##### Awaiting Pending Read (CSOT-only)

baileympearson · 2025-05-02T15:14:21Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+The next time the connection is checked out, the driver MUST attempt to read and discard the remaining response from the
+socket. The workflow for this is as follows:
+
+- The connection MUST persist the current time recorded immediately after the original socket timeout, and this


Not opposed to having the logic in the connection but we do have precedent for other connection state related-actions in checkIn (destroying the connection, etc). Would recording the start time make sense at checkIn instead of after the timeout in the driver's connection abstraction?

I have two thoughts on this. First, some drivers will likely add an object to the connection to maintain state for a pending response, for example:

type pendingResponseState struct { remainingBytes int32 requestID int32 start time.Time } type connection struct { // pendingResponseState contains information required to attempt a pending read // in the event of a socket timeout for an operation that has appended // maxTimeMS to the wire message. pendingResponseState *pendingResponseState }

It would be more pragmatic to update the current time where remainginBytes and requestID are assigned (which is when the socket times out).

Additionally, we want to start this “countdown” ASAP in case the connection is “dead”: in such cases the “aliveness check” will be a non-blocking failure while awaiting a pending response. Delaying when we set the current time reduces the likelihood (albeit small) that the cull 3 second pending response timeout has been exceeded while the connection remains idle in the pool.

baileympearson · 2025-05-02T15:29:49Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

@@ -576,7 +576,106 @@ other threads from checking out [Connections](#connection) while establishing a
 Before a given [Connection](#connection) is returned from checkOut, it must be marked as "in use", and the pool's
 availableConnectionCount MUST be decremented.

-```text
+If an operation times out the socket while awaiting a server response and CSOT is enabled and `maxTimeMS` was added to
+the command, the driver MUST mark the connection as "pending" and record the current time in a way that can be updated.


So - I do understand this is the approach outlined in the design and that it is generally valuable to have example algoritims and pseudocode in the specification. However, I do think this approach makes assumptions about connection layer implementations and socket APIs that don't hold true for all drivers (at least in Node, they certainly don't). For example - Node's socket API is push-based and we collect chunks when they're available. Immediate reads do not make sense for our connection layer implementation and so the existing implementation of await_pending_response doesn't either.

I'm happy to work with you on the phrasing here but can we try to phrase these requirements in a way that outlines the requirements, and then outlines a particular implementation that satisfies the requirements? Ex: in Node, the socket API we use pushes chunks of data to us automatically (there is no read(n) method). We collect this into a buffer automatically when they're available. So when we implement these changes in Node, what we will likely do instead of this algorithm is:

on timeout, fail the current request and record the time of timeout but don't stop receiving data chunks from the socket.

set the socket's timeout to 3s (time out the socket if no chunks arrive in 3s)

in checkout, check if the pending connection has finished reading the response from the server. If it has, discard and continue. If not, calculate the wait time and wait. On success, proceed. On timeout, close the connection.

I think this approach still satisfies the goals of the spec changes but as the spec is currently written, our implementation would not be spec compliant. Thoughts?

IIUC in the Node case, bytes that have not been consumed will still sit on the socket after a timeout. So you would still need to read and discard any buffered data when checking out a connection that has been pinned to a pending response. The three bullet points you note do not conflict with the algorithm, AFAICT. I would be happy to troubleshoot this offline with you, though.

baileympearson · 2025-05-02T15:58:46Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+  connectionId: int64;
+
+  /**
+   *  The time it took to complete the pending read.


So long as data is still coming back from socket in intervals of <3s, it is possible for the same connection to require multiple checkout requests to fully exhaust. So - is this duration the total time it took to read all of the data off of the socket (now() - time of timeout) or the amount of time that the checkout request waited on the final pending read wait?

(same comment for logging events)

I would anticipate this duration to be within the context of ConnectionPendingResponseStarted, i.e. 1 call to await_pending_response.

baileympearson · 2025-05-02T15:59:24Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+   */
+  connectionId: int64;
+
+  /**


Do you think it would be valuable to include a duration here as well, indicating how long the request waited for the pending read before failing?

(same comment for logging events)

I do think this is a good idea, this could highlight the case where you are trickling 1 byte aliveness checks but the response continues to time out while attempting to discard the input TCP buffer. Good call.

baileympearson · 2025-05-02T16:02:49Z

source/connection-monitoring-and-pooling/tests/README.md

+    `ConnectionPendingResponseFailed` events.
+3. Instantiate a connection pool using the mock listener’s address, ensuring readiness without error. Attach the event
+    monitor to observe the connection’s state.
+4. Check out a connection from the pool and initiate a read operation with an appropriate socket timeout (e.g, 10ms)


This comment is related to my other comment about a less socket-api specific implementation. Is it possible to write these tests in a way that doesn't require explicit use of a read API? Node's connection layer doesn't expose a read method method, we only expose command() which performs write+read on the underlying socket.

So Node only has access to a round-trip API? Even in the non-public API? Could you not just discard the write half in the mock listener?

baileympearson · 2025-05-02T16:07:10Z

source/connection-monitoring-and-pooling/tests/README.md

+7. Verify that one event for each `ConnectionPendingResponseStarted` and `ConnectionPendingResponseFailed` was emitted.
+    Also verify that the fields were correctly set for each event.
+
+#### Connection Aliveness Check Succeeds


Can we add a test that demonstrates that multiple aliveness checks might be required to fully read the response from the socket? I'm imagining the mock server emitting chunks of data every second for longer than 6s (2 * the static timeout). Each checkout should fail, but we'll continue to read

It’s unclear to me what the goal of this would be. Are you just wanting to make sure the driver does not pre-maturely close the connection if the aliveness check succeeds? We could use event monitoring for that instead.

baileympearson · 2025-05-02T16:12:51Z

source/connection-monitoring-and-pooling/tests/logging/connection-logging.yml

+
+  - description: "force a pending response read, fail first try, succeed second try"
+    operations:
+      - name: createEntities


If possible, can we add a test that demonstrates that when the pending read checkout has no timeoutMS set, we use socket_timeout_ms (if it is <3s)?

Great catch! The Go Driver doesn’t support socket timeouts which is a technically deprecated option. Perhaps @ShaneHarvey can opine. If we decide to add this test would you mind implementing it since the Go Driver has no way of verifying.

prestonvasquez added 2 commits October 14, 2024 15:06

DRIVERS-2884 Add connection churn spec tests

0f12706

DRIVERS-2884 Update json

fe18120

prestonvasquez requested a review from ShaneHarvey October 14, 2024 21:13

ShaneHarvey reviewed Oct 15, 2024

View reviewed changes

source/client-side-operations-timeout/tests/connection-churn.yml Outdated Show resolved Hide resolved

source/client-side-operations-timeout/tests/connection-churn.yml Outdated Show resolved Hide resolved

prestonvasquez requested a review from ShaneHarvey October 21, 2024 18:15

prestonvasquez added 7 commits October 30, 2024 15:59

DRIVERS-2884 Clean up spec tests

05cc88b

Update CMAP to include foreground read

98c2a73

Update changelog

4827995

Add justification for CMAP update

234b729

Remove unecessary example

ccfbcf1

Use consistent keys

fed567b

Update timeouts

8840be4

ShaneHarvey requested changes Apr 17, 2025

View reviewed changes

prestonvasquez added 15 commits April 21, 2025 18:11

DRIVERS-2884 Resolve merge conflicts

c1bee3b

DRIVERS-2884 Update pending response unified spec tests

c0e5aee

DRIVERS-2884 Add UML and update wording

dde9e22

DRIVERS-2884 Remove uneeded text from code snippet

5e0305a

DRIVERS-2884 Add prose tests

496724c

DRIVERS-2884 Clean up presentation

258edf8

DRIVERS-2884 Add logs and events

d217d10

DRIVERS-2884 Add log part

cc8aec0

DRIVERS-2884 Add Q&A section

3d98039

DRIVERS-2884 Add changelog

07e75bd

DRIVERS-2884 Fix Markdown failures

8d9e71b

DRIVERS-2884 Update schema

5c68f77

DRIVERS-2884 Update schema w/ new connection events

e2653cb

DRIVERS-2884 Remove additional properties

00aa620

DRIVERS-2884 Remove ignoring extra events

b29d6cc

prestonvasquez marked this pull request as ready for review April 25, 2025 21:36

prestonvasquez requested a review from a team as a code owner April 25, 2025 21:36

prestonvasquez requested review from a team as code owners April 25, 2025 21:36

prestonvasquez requested review from alcaeus, stIncMale, baileympearson and ShaneHarvey and removed request for a team April 25, 2025 21:36

codeowners-service-app bot requested a review from qingyang-hu April 25, 2025 21:49

prestonvasquez added 3 commits April 25, 2025 15:50

DRIVERS-2884 Clean up tests

b04b340

DRIVERS-2884 Uncapitalize D in ID

40b302c

DRIVERS-2884 Another ID cleanup

4348b44

prestonvasquez removed the request for review from qingyang-hu April 25, 2025 22:08

DRIVERS-2884 Remove the word write

dd0dbe9

codeowners-service-app bot requested a review from qingyang-hu April 25, 2025 22:15

prestonvasquez added 2 commits April 25, 2025 16:15

DRIVERS-2884 Remove the word write

e43c466

Add punctuation

89754ce

prestonvasquez removed request for stIncMale and qingyang-hu April 29, 2025 18:42

alcaeus reviewed Apr 30, 2025

View reviewed changes

prestonvasquez mentioned this pull request Apr 30, 2025

GODRIVER-3173 Complete pending reads on conn checkout mongodb/mongo-go-driver#1977

Open

prestonvasquez added 3 commits April 30, 2025 16:59

Merge branch 'master' into DRIVERS-2884

bc893c6

DRIVERS-2884 Update schema latest

a3c00a4

DRIVERS-2884 Clarify schema bump

6a663eb

prestonvasquez requested a review from alcaeus April 30, 2025 23:02

codeowners-service-app bot requested a review from qingyang-hu May 2, 2025 07:53

baileympearson requested changes May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRIVERS-2884 Avoid connection churn when operations timeout #1675

DRIVERS-2884 Avoid connection churn when operations timeout #1675

prestonvasquez commented Oct 14, 2024 •

edited

Loading

ShaneHarvey Apr 17, 2025

ShaneHarvey Apr 17, 2025

codeowners-service-app bot commented Apr 25, 2025 •

edited

Loading

alcaeus left a comment

alcaeus Apr 30, 2025

baileympearson May 1, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025 •

edited

Loading

baileympearson May 1, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025

baileympearson May 2, 2025

prestonvasquez May 2, 2025 •

edited

Loading

		@@ -3555,6 +3579,8 @@ other specs and collating spec changes developed in parallel or during the sam

		## Changelog

		- 2025-04-25: Schema version 1.24

		@@ -576,7 +576,106 @@ other threads from checking out [Connections](#connection) while establishing a
		Before a given [Connection](#connection) is returned from checkOut, it must be marked as "in use", and the pool's


		#### Connection Aliveness Check Fails

		1. Initialize a mock TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to

	If an operation times out the socket while awaiting a server response and CSOT is enabled and `maxTimeMS` was added to
	##### Awaiting Pending Read (CSOT-only)

DRIVERS-2884 Avoid connection churn when operations timeout #1675

Are you sure you want to change the base?

DRIVERS-2884 Avoid connection churn when operations timeout #1675

Conversation

prestonvasquez commented Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeowners-service-app bot commented Apr 25, 2025 • edited Loading

alcaeus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez May 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prestonvasquez May 2, 2025 • edited Loading

Choose a reason for hiding this comment

prestonvasquez commented Oct 14, 2024 •

edited

Loading

codeowners-service-app bot commented Apr 25, 2025 •

edited

Loading

prestonvasquez May 2, 2025 •

edited

Loading

prestonvasquez May 2, 2025 •

edited

Loading