Skip to content

chore: connection pipeline cache does not shrink #4491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 18, 2025
Merged

chore: connection pipeline cache does not shrink #4491

merged 6 commits into from
Feb 18, 2025

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Jan 21, 2025

Add test to show that pipeline cache won't shrink once it's filled if clients ping pong between async and sync dispatch

  • add period based shrinkage for pipeline cache
  • add tests

Proves #4461

@kostasrim kostasrim self-assigned this Jan 21, 2025
@kostasrim kostasrim changed the title chore: connection pipeline cache grows without shrinking chore: connection pipeline cache does not shrink Jan 21, 2025
# pipeline_cache_bytes because it recycled too many messages, they won't gradually be released
# if one command (one connection out of `n` connections) dispatches async. Only 1 command out of
# n connections must be dispatched async and the pipeline won't gradually be relesed.
for i in range(30):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cam drain the pipeline cache bytes once we stop dispatching async. But on large pool of connections only one command must dispatch async and then we need to internally reset the counter. If this pattern continues the size of the cache will remain constant and will not be released gradually.

info = await good_client.info()

# Drained
assert info["pipeline_cache_bytes"] == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drained completely

@@ -316,6 +314,36 @@ QueueBackpressure& GetQueueBackpressure() {

thread_local vector<Connection::PipelineMessagePtr> Connection::pipeline_req_pool_;

class PipelineCacheSizePaceMaker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe PipelineWatermarkTracker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about PipelineCacheSizeTracker ? (not strongly opnionated so let me know which one you prefer!)

@@ -316,6 +314,36 @@ QueueBackpressure& GetQueueBackpressure() {

thread_local vector<Connection::PipelineMessagePtr> Connection::pipeline_req_pool_;

class PipelineCacheSizePaceMaker {
public:
bool WatermarkReached(size_t pipeline_sz) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe CheckAndUpdateWatermark


@dfly_args({"proactor_threads": 1})
async def test_pipeline_cache_size(df_factory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some comments on this test

@adiholden
Copy link
Collaborator

The main purpose of the pipeline cache is to reduce number of allocations.
I would like to see in a test that if we have one or several connections running commands in pipeline we utilize the cache in optimal way so that when the commands are executed the cache does not grows and shrinks and grows and shrinks, and when we finish with execution the cache shrinks

@adiholden
Copy link
Collaborator

also lets try to think when does this algorithm does not performs well

@kostasrim
Copy link
Contributor Author

kostasrim commented Feb 7, 2025

@adiholden

The main purpose of the pipeline cache is to reduce number of allocations.
I would like to see in a test that if we have one or several connections running commands in pipeline we utilize the cache in optimal way so that when the commands are executed the cache does not grows and shrinks and grows and shrinks, and when we finish with execution the cache shrinks

Well, as long as we are doing only async dispatches we won't release the pipeline cache at all. So to the part of your question "if we have one or several connections running commands in a pipeline", then the answer is simply that we won't ever shrink. This is simple to prove (just by looking at the code) but I added a very small test case just in case(which I will push).

Keep in mind that this behaviour was and still is the same; we only consider shrinking the cache only when connections dispatch synchronously. Before however, we decided to shrink the cache based on a constant factor which was problematic because N connections with at least a single async dispatch every N messages would result in an underutilized cache (it would never shrink).

also lets try to think when does this algorithm does not performs well

Great question! A few thoughts. The current approach is: given a sampling window, synchronous dispatches poll the size of the cache and track its minimum size within that window. If that size is non zero and the sampling window is over an element is released from the cache.

One thing is that now the rate that we shrink the cache is constant since we can pop 1 element at the end of each sampling window. So for example, with 10ms sampling windows, we can pop 100 items from the cache. This was not the case before, where a storm of synchronous commands would agressively shrink the cache (proportianal to the weight/ number of sync messages).

Also:

min_ = std::min(min_, pipeline_sz);
if (elapsed < std::chrono::milliseconds(10)) { // <---- This SHOULD really be a flag
  return false;
}

const size_t max = Limits::max();
const bool watermark_reached = (min_ > 0);
min_ = max;
last_check_ = Clock::now();

return watermark_reached;

Polling can be unreliable. With multiple connections all dispatching pipelines and at least 1 sync dispatch per sampling window there is a chance (and maybe high) that we always shrink the pipeline on each window. Imagine a pipeline just got executed, the messages got recycled into the cache and the connection fiber just preempted for IO. Now another connection dispatched
synchronously and the cache is non empty so minimum is non zero and we remove an element. Next fiber dispatches another pipeline and we just allocated back what we deallocated a step ago. If what I described happens at least once every window then we kinda ping pong the growth/shrinkage of the cache. I am not sure though how big of an impact this is for these kind of workloads.

Lastly a flag (which can be set at runtime via config set + maybe something else (to increase how many items we release in one a step) maybe is enough to adjust a datastore to the workload needs.

@kostasrim kostasrim requested a review from adiholden February 18, 2025 11:14
@kostasrim kostasrim merged commit a918c52 into main Feb 18, 2025
10 checks passed
@kostasrim kostasrim deleted the kpr4 branch February 18, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants