-
Notifications
You must be signed in to change notification settings - Fork 12
Connection pool size configuration #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
muzarski
wants to merge
12
commits into
scylladb:master
Choose a base branch
from
muzarski:core-connections-per-host-shard
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Connection pool size configuration #281
muzarski
wants to merge
12
commits into
scylladb:master
from
muzarski:core-connections-per-host-shard
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We 0-initialize deprecated fields, so we retain the binary compatibility with cpp-driver.
Since we implemented metrics, we can enable HeartbeatFailed test with some adjustments to log filtering. This test seems to fail under valgrind. This is why I enable it to run next to other test that cannot be run under valgrind. Note: The original test seems to be flaky for Cassandra. The following scenario occurred: 1. node2 is paused 2. keepaliver notifies the pool refiller about that 3. refiller removes the connection to node2 (metrics::total_connections -= 1) 4. in the test, we read get_metrics().total_connections < initial_connections - we go out of the loop 5. refiller tries to open a connection again (metrics::total_connections += 1) 6. we read get_metrics().total_connections, and expect total_connections to be less than initial_connections - but it is not. This is why, to combat this, I adjusted the test so the same metrics snapshot is used to leave the loop and make an assertion. In this case, the aforementioned "unlucky" scenario will not happen.
To be quite honest, I thought we'll be able to enable more tests, but: 1. StatsConnections Requires `cass_cluster_set_num_threads_io`, `cass_cluster_set_core_connections_per_host` and `cass_cluster_set_constant_reconnect`. 2. ErrorsConnectionTimeouts Requires `cass_cluster_set_core_connections_per_host`. 3. SpeculativeExecutionRequests Requires `cass_session_get_speculative_execution_metrics`. 4. Requests This one is interesting. It turns out that cpp-driver stores latency stats as microseconds, while rust-driver stores them as milliseconds. Because of that, the mean and median latency is rounded to 0 (at least for my machine). The test expects them to be greater than 0, which makes sense assuming the driver collects the stats with microsecond precision. I'm not sure how to address this one. Is there any way to force higher >=1ms latencies in the test?
Without this, valgrind complains about access to uninitialized memory.
Why does this test not work without adjustments? Well, this is because rust-driver collects latencies with millisecond graunularity. In result, most of the latencies during the tests in local setup are rounded to 0ms. This is why, we somehow need to simulate higher latencies during the test. There is one piece of code that user controls and is executed in rust-driver in between start and end time measurements - namely `HistoryListener::log_attempt_start`. This is where we can add a sleep to simulate higher latencies in local setup. And so, I implemented `SleepingHistoryListener` that does just that. In addition, I implemented the testing API to set such listener on the statement. The "Requests" test is adjusted accordingly, and enabled. Note: Since all latencies during the test in local setup are now expected to be around 1ms, I loosened the stddev check to be `>= 0` instead of `> 0`.
I haven't set the default value yet. This will be done once cass_cluster_set_core_connections_per_shard is introduced and implemented (later in this PR).
This will serve as an extension to cpp API. The default connection pool size is 1 per shard.
This tests `cass_cluster_set_core_connections_per_shard` as well as `cass_session_get_metrics`.
4b4ec8e
to
f70522d
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #240
Ref: #132
Depends on: #280 (I need metrics support to implement one integration test).
Implements:
cass_cluster_set_core_connections_per_host
cass_cluster_set_core_connections_per_shard
(as an extension to cpp-driver API)Integration tests
Existing ErrorsConnectionTimeouts metrics test
I don't know who implemented this test, but (IMO) it's just wrong. So my understanding of intentions is:
If those were the intentions, the
EXPECT_GE(2u, metrics.errors.connection_timeouts);
is wrong - it should be the other way around. Currently it expects2 >= connection_timeouts
.Ok, so let's say we fix it. It still won't work with rust driver. The
Session
object will not even be built - we will fail to open a control connection (because of timeouts) and fail to fetch the metadata. In particular, we won't even be able to callSession::get_metrics()
- thecass_session_get_metrics
will return early with some log error message.Thus, I'm not enabling this test.
Fun fact: in my local setup, this test passes. Even though, the connect timeout is low, it is not triggered. And then, the
metrics.errors.connection_timeouts
is 0 - this means that aforementioned assertion passes as well.My new
StatsShardConnections
metrics testI implemented simple test where we configure a pool size to be 2 connections per shard (
cass_cluster_set_core_connections_per_shard
). It checks whether all connections are registered in the metrics. The expected value is at leastnr_hosts * nr_shards * 2
.Pre-review checklist
[ ] I have implemented Rust unit tests for the features/changes introduced..github/workflows/build.yml
ingtest_filter
..github/workflows/cassandra.yml
ingtest_filter
.