chore: connection pipeline cache does not shrink #4491

kostasrim · 2025-01-21T10:52:08Z

Add test to show that pipeline cache won't shrink once it's filled if clients ping pong between async and sync dispatch

add period based shrinkage for pipeline cache
add tests

kostasrim · 2025-01-21T10:55:30Z

tests/dragonfly/connection_test.py

+    # pipeline_cache_bytes because it recycled too many messages, they won't gradually be released
+    # if one command (one connection out of `n` connections) dispatches async. Only 1 command out of
+    # n connections must be dispatched async and the pipeline won't gradually be relesed.
+    for i in range(30):


We cam drain the pipeline cache bytes once we stop dispatching async. But on large pool of connections only one command must dispatch async and then we need to internally reset the counter. If this pattern continues the size of the cache will remain constant and will not be released gradually.

Signed-off-by: kostas <[email protected]>

kostasrim · 2025-01-21T10:58:19Z

tests/dragonfly/connection_test.py

+    info = await good_client.info()
+
+    # Drained
+    assert info["pipeline_cache_bytes"] == 0


drained completely

adiholden · 2025-01-30T21:16:25Z

src/facade/dragonfly_connection.cc


 thread_local vector<Connection::PipelineMessagePtr> Connection::pipeline_req_pool_;

+class PipelineCacheSizePaceMaker {


nit: maybe PipelineWatermarkTracker

how about PipelineCacheSizeTracker ? (not strongly opnionated so let me know which one you prefer!)

adiholden · 2025-01-30T21:16:52Z

src/facade/dragonfly_connection.cc


+class PipelineCacheSizePaceMaker {
+ public:
+  bool WatermarkReached(size_t pipeline_sz) {


nit: maybe CheckAndUpdateWatermark

adiholden · 2025-01-30T21:20:24Z

tests/dragonfly/connection_test.py

+
+
+@dfly_args({"proactor_threads": 1})
+async def test_pipeline_cache_size(df_factory):


please add some comments on this test

adiholden · 2025-01-30T21:30:36Z

The main purpose of the pipeline cache is to reduce number of allocations.
I would like to see in a test that if we have one or several connections running commands in pipeline we utilize the cache in optimal way so that when the commands are executed the cache does not grows and shrinks and grows and shrinks, and when we finish with execution the cache shrinks

adiholden · 2025-01-30T21:32:05Z

also lets try to think when does this algorithm does not performs well

kostasrim · 2025-02-07T10:54:16Z

@adiholden

The main purpose of the pipeline cache is to reduce number of allocations.
I would like to see in a test that if we have one or several connections running commands in pipeline we utilize the cache in optimal way so that when the commands are executed the cache does not grows and shrinks and grows and shrinks, and when we finish with execution the cache shrinks

Well, as long as we are doing only async dispatches we won't release the pipeline cache at all. So to the part of your question "if we have one or several connections running commands in a pipeline", then the answer is simply that we won't ever shrink. This is simple to prove (just by looking at the code) but I added a very small test case just in case(which I will push).

Keep in mind that this behaviour was and still is the same; we only consider shrinking the cache only when connections dispatch synchronously. Before however, we decided to shrink the cache based on a constant factor which was problematic because N connections with at least a single async dispatch every N messages would result in an underutilized cache (it would never shrink).

also lets try to think when does this algorithm does not performs well

Great question! A few thoughts. The current approach is: given a sampling window, synchronous dispatches poll the size of the cache and track its minimum size within that window. If that size is non zero and the sampling window is over an element is released from the cache.

One thing is that now the rate that we shrink the cache is constant since we can pop 1 element at the end of each sampling window. So for example, with 10ms sampling windows, we can pop 100 items from the cache. This was not the case before, where a storm of synchronous commands would agressively shrink the cache (proportianal to the weight/ number of sync messages).

Also:

min_ = std::min(min_, pipeline_sz);
if (elapsed < std::chrono::milliseconds(10)) { // <---- This SHOULD really be a flag
  return false;
}

const size_t max = Limits::max();
const bool watermark_reached = (min_ > 0);
min_ = max;
last_check_ = Clock::now();

return watermark_reached;

Polling can be unreliable. With multiple connections all dispatching pipelines and at least 1 sync dispatch per sampling window there is a chance (and maybe high) that we always shrink the pipeline on each window. Imagine a pipeline just got executed, the messages got recycled into the cache and the connection fiber just preempted for IO. Now another connection dispatched
synchronously and the cache is non empty so minimum is non zero and we remove an element. Next fiber dispatches another pipeline and we just allocated back what we deallocated a step ago. If what I described happens at least once every window then we kinda ping pong the growth/shrinkage of the cache. I am not sure though how big of an impact this is for these kind of workloads.

Lastly a flag (which can be set at runtime via config set + maybe something else (to increase how many items we release in one a step) maybe is enough to adjust a datastore to the workload needs.

src/facade/dragonfly_connection.cc

kostasrim self-assigned this Jan 21, 2025

kostasrim changed the title ~~chore: connection pipeline cache grows without shrinking~~ chore: connection pipeline cache does not shrink Jan 21, 2025

kostasrim commented Jan 21, 2025

View reviewed changes

chore: connection pipeline cache grows without shrinking

33d4a49

Signed-off-by: kostas <[email protected]>

kostasrim force-pushed the kpr4 branch from b5122b2 to 33d4a49 Compare January 21, 2025 10:58

kostasrim commented Jan 21, 2025

View reviewed changes

kostasrim added 2 commits January 30, 2025 12:21

Merge branch 'main' into kpr4

5696c66

add pipeline pacemaker

aa4eb3d

adiholden reviewed Jan 30, 2025

View reviewed changes

comments + small fixes

0cbdc1a

kostasrim commented Feb 7, 2025

View reviewed changes

src/facade/dragonfly_connection.cc Show resolved Hide resolved

kostasrim commented Feb 7, 2025

View reviewed changes

src/facade/dragonfly_connection.cc Show resolved Hide resolved

kostasrim added 2 commits February 18, 2025 12:57

Merge branch 'main' into kpr4

95df3ea

comments

20b405b

kostasrim requested a review from adiholden February 18, 2025 11:14

adiholden approved these changes Feb 18, 2025

View reviewed changes

kostasrim merged commit a918c52 into main Feb 18, 2025
10 checks passed

kostasrim deleted the kpr4 branch February 18, 2025 12:29

romange reviewed Feb 18, 2025

View reviewed changes

src/facade/dragonfly_connection.cc Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

chore: connection pipeline cache does not shrink #4491

chore: connection pipeline cache does not shrink #4491

kostasrim commented Jan 21, 2025 •

edited

Loading

Uh oh!

kostasrim Jan 21, 2025

Uh oh!

kostasrim Jan 21, 2025

Uh oh!

adiholden Jan 30, 2025

Uh oh!

kostasrim Feb 7, 2025

Uh oh!

adiholden Jan 30, 2025

Uh oh!

adiholden Jan 30, 2025

Uh oh!

adiholden commented Jan 30, 2025

Uh oh!

adiholden commented Jan 30, 2025

Uh oh!

kostasrim commented Feb 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		thread_local vector<Connection::PipelineMessagePtr> Connection::pipeline_req_pool_;

		class PipelineCacheSizePaceMaker {



		@dfly_args({"proactor_threads": 1})
		async def test_pipeline_cache_size(df_factory):

Uh oh!

chore: connection pipeline cache does not shrink #4491

chore: connection pipeline cache does not shrink #4491

Conversation

kostasrim commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostasrim Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

kostasrim Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

adiholden Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

kostasrim Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

adiholden Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

adiholden Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

adiholden commented Jan 30, 2025

Uh oh!

adiholden commented Jan 30, 2025

Uh oh!

kostasrim commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kostasrim commented Jan 21, 2025 •

edited

Loading

kostasrim commented Feb 7, 2025 •

edited

Loading