-
Notifications
You must be signed in to change notification settings - Fork 1k
chore: connection pipeline cache does not shrink #4491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# pipeline_cache_bytes because it recycled too many messages, they won't gradually be released | ||
# if one command (one connection out of `n` connections) dispatches async. Only 1 command out of | ||
# n connections must be dispatched async and the pipeline won't gradually be relesed. | ||
for i in range(30): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cam drain the pipeline cache bytes
once we stop dispatching async. But on large pool of connections only one command must dispatch async and then we need to internally reset the counter. If this pattern continues the size of the cache will remain constant and will not be released gradually.
Signed-off-by: kostas <[email protected]>
tests/dragonfly/connection_test.py
Outdated
info = await good_client.info() | ||
|
||
# Drained | ||
assert info["pipeline_cache_bytes"] == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drained completely
src/facade/dragonfly_connection.cc
Outdated
@@ -316,6 +314,36 @@ QueueBackpressure& GetQueueBackpressure() { | |||
|
|||
thread_local vector<Connection::PipelineMessagePtr> Connection::pipeline_req_pool_; | |||
|
|||
class PipelineCacheSizePaceMaker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe PipelineWatermarkTracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about PipelineCacheSizeTracker
? (not strongly opnionated so let me know which one you prefer!)
src/facade/dragonfly_connection.cc
Outdated
@@ -316,6 +314,36 @@ QueueBackpressure& GetQueueBackpressure() { | |||
|
|||
thread_local vector<Connection::PipelineMessagePtr> Connection::pipeline_req_pool_; | |||
|
|||
class PipelineCacheSizePaceMaker { | |||
public: | |||
bool WatermarkReached(size_t pipeline_sz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe CheckAndUpdateWatermark
|
||
@dfly_args({"proactor_threads": 1}) | ||
async def test_pipeline_cache_size(df_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add some comments on this test
The main purpose of the pipeline cache is to reduce number of allocations. |
also lets try to think when does this algorithm does not performs well |
Well, as long as we are doing only async dispatches we won't release the pipeline cache at all. So to the part of your question "if we have one or several connections running commands in a pipeline", then the answer is simply that we won't ever shrink. This is simple to prove (just by looking at the code) but I added a very small test case just in case(which I will push). Keep in mind that this behaviour was and still is the same; we only consider shrinking the cache only when connections dispatch synchronously. Before however, we decided to shrink the cache based on a constant factor which was problematic because N connections with at least a single async dispatch every N messages would result in an underutilized cache (it would never shrink).
Great question! A few thoughts. The current approach is: given a sampling window, synchronous dispatches poll the size of the cache and track its minimum size within that window. If that size is non zero and the sampling window is over an element is released from the cache. One thing is that now the rate that we shrink the cache is constant since we can pop 1 element at the end of each sampling window. So for example, with 10ms sampling windows, we can pop 100 items from the cache. This was not the case before, where a storm of synchronous commands would agressively shrink the cache (proportianal to the weight/ number of sync messages). Also:
Polling can be unreliable. With multiple connections all dispatching pipelines and at least 1 sync dispatch per sampling window there is a chance (and maybe high) that we always shrink the pipeline on each window. Imagine a pipeline just got executed, the messages got recycled into the cache and the connection fiber just preempted for IO. Now another connection dispatched Lastly a flag (which can be set at runtime via config set + maybe something else (to increase how many items we release in one a step) maybe is enough to adjust a datastore to the workload needs. |
Add test to show that pipeline cache won't shrink once it's filled if clients ping pong between async and sync dispatch
Proves #4461