Connection pipeline cache will grow without shrinking #4461

kostasrim · 2025-01-15T11:13:30Z

While investigating a datastore rss vs used memory gap we saw that
Dragonfly with high pipeline cache byets is high, not that dispatch queue was 0
pipeline_cache_bytes:3756692020
dispatch_queue_bytes:0

@kostasrim initial investigation leads:
We have a corner case that can be described as:

We call ShrinkPipelinePool() that "gradually releases the pipeline messages in the cache" and the way we do that is:

if (free_req_release_weight > stats_->num_conns)  {
  //blah blah
  pipeline_req_pool_.pop_back();  // release one item from the pipeline
}

The problem is, that each time one of the connections dispatches asynchronously we reset free_req_release_weight to 0. (free_req_release_weight is a thread local).

So it could be the case that a workload dispatches a lot of commands async so the cache grows big enough but then we can end up in this weird corner case/loop:

Let's say we have n connections:

n - 1 connections dispatch synchronously and the call ShrinkToPipelinePool does nothing (we need n connections to shrink)
Only one connection dispatches asynchronously -> resets free_req_release_weight to 0 -> now we need to reach n connections to shrink the cache

Endless loop between 1-2 depending on the workload. From my understanding only one connection out of n must do this and we won't shrink as long as this keep happening.

On a large connection pool, only one "bad actor" will cause this endless loop and I guess the probability of this happening increases with the number of connections as it's more probably that at least one will dispatch async.

The text was updated successfully, but these errors were encountered:

romange · 2025-01-23T14:38:24Z

if (free_req_release_weight > stats_->num_conns) is used to pace the shrinkage. but we never check how many items we actually use in the pool. another approach could be is to use a watermark, i.e. periodically measure the maximum number of pending items across some period, or equivalently the minimum number of items left in pipeline_req_pool_ during some period, and if that number is greater than 0, i.e. the pool was not fully utilised, we pop an element from it.

kostasrim added the bug Something isn't working label Jan 15, 2025

adiholden assigned kostasrim Jan 20, 2025

kostasrim mentioned this issue Jan 21, 2025

chore: connection pipeline cache does not shrink #4491

Merged

kostasrim closed this as completed Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection pipeline cache will grow without shrinking #4461

Connection pipeline cache will grow without shrinking #4461

kostasrim commented Jan 15, 2025 •

edited by adiholden

Loading

romange commented Jan 23, 2025

Connection pipeline cache will grow without shrinking #4461

Connection pipeline cache will grow without shrinking #4461

Comments

kostasrim commented Jan 15, 2025 • edited by adiholden Loading

romange commented Jan 23, 2025

kostasrim commented Jan 15, 2025 •

edited by adiholden

Loading