Skip to content

cluster: tracing config options #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

muzarski
Copy link
Contributor

@muzarski muzarski commented Oct 24, 2024

Implemented:

  • cass_cluster_set_tracing_consistency
  • cass_cluster_set_tracing_max_wait_time
  • cass_cluster_set_tracing_retry_wait_time

Set the defaults for this options as well.

Notice, that we needed to make use of a small hack, since rust-driver does not expose the option to set tracing info fetch timeout. Instead, it allows user to define maximum number of retries. This can be computed based on timeout and retry interval provided by user via cpp-rust-driver API.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have implemented Rust unit tests for the features/changes introduced.`
  • [ ] I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
  • [ ] I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

According to the comment in rust-driver:
// Apparently, Consistency can be set to Serial or LocalSerial in SELECT statements
// to make them use Paxos.
Implemented `cass_cluster_set_tracing_consistency`.
Set the default ( * <b>Default:</b> CASS_CONSISTENCY_ONE).
Implemented `cass_cluster_set_tracing_retry_wait_time`.
Set the default (3ms).
rust-driver does not expose such config option. However,
it does expose maximum number of retries for tracing info fetch.

Having tracing info fetch timeout and interval (between each retry),
we can compute the number of retries that should be performed
to fetch the tracing info.
@muzarski muzarski self-assigned this Oct 24, 2024
@muzarski muzarski requested review from Lorak-mmk and wprzytula and removed request for Lorak-mmk October 24, 2024 15:58
@@ -855,6 +855,8 @@ impl TryFrom<CassConsistency> for Consistency {
CassConsistency::CASS_CONSISTENCY_LOCAL_QUORUM => Ok(Consistency::LocalQuorum),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in commit name: CassConsistency: adjust converion to serial variants of Consistency -> conversion

Comment on lines +858 to +859
CassConsistency::CASS_CONSISTENCY_LOCAL_SERIAL => Ok(Consistency::LocalSerial),
CassConsistency::CASS_CONSISTENCY_SERIAL => Ok(Consistency::Serial),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug fix! Let's prioritise such things more, instead of packing them into PRs with new features. Bugs should be fixed ASAP!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a driver still in alpha packing a bugfix in a small PR like this is imo perfectly acceptable.

@@ -159,6 +167,19 @@ pub fn build_session_builder(
session_builder = session_builder.user(username, password)
}

// Compute the number of retries for tracing info fetch
// based on the timeout and interval provided by user.
let tracing_info_fetch_attemps = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: attemps -> attempts

// Compute the number of retries for tracing info fetch
// based on the timeout and interval provided by user.
let tracing_info_fetch_attemps = {
let attemps = cluster.tracing_max_wait_time.as_millis()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +439 to +448
#[no_mangle]
pub unsafe extern "C" fn cass_cluster_set_tracing_max_wait_time(
cluster_raw: *mut CassCluster,
max_wait_time_ms: c_uint,
) {
let cluster = ptr_to_ref_mut(cluster_raw);

cluster.tracing_max_wait_time = Duration::from_millis(max_wait_time_ms.into());
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a unit test for setting all tracing parameters?

Comment on lines 45 to 48
const DEFAULT_KEEPALIVE_TIMEOUT: Duration = Duration::from_secs(60);
// - tracing info fetch interval is 3 millis
const DEFAULT_TRACING_INFO_FETCH_INTERVAL: Duration = Duration::from_millis(3);
// - tracing consistency is ONE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 3ms is the defualt in Rust Driver. Is that the default in cpp-driver too?

Comment on lines +170 to +182
// Compute the number of retries for tracing info fetch
// based on the timeout and interval provided by user.
let tracing_info_fetch_attemps = {
let attemps = cluster.tracing_max_wait_time.as_millis()
/ session_builder
.config
.tracing_info_fetch_interval
.as_millis();

NonZeroU32::new(attemps as u32).unwrap_or_else(|| NonZeroU32::new(1).unwrap())
};
session_builder = session_builder.tracing_info_fetch_attempts(tracing_info_fetch_attemps);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 🤔💭 What is the semantics of cass_cluster_set_tracing_max_wait_time in cpp-driver?
If it puts a time limit on whole tracing fetch, then this is very different to what you did.
Rust Driver, afaict, doesn't have a timeout for a single attempt - only a limit on number of attempts.
So a single attempt can take arbitrarily long time, and you assume they are instant.

Maybe a correct solution here is to wrap get_tracing_info calls into tokio::time::timeout?
Actually, what is the API in cpp-driver to retrieve tracing info? I don't see any call to get_tracing_info.
And if there are none, then what those parameters even do?

📌 As a side note, maybe we should add a timeout for single attempt to Rust Driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any call to get_tracing_info. And if there are none, then what those parameters even do?

Great observation - I was sure that there was already an implemented method which fetches the tracing info. There is no such method (even in cassandra.h) - cpp-driver docs say:

**Note**: The driver does not return the actual tracing data for the request. The
application itself must use the returned tracing identifier to query the tables.

## Configuration

By default, when tracing is enabled, the driver will wait for the query's tracing
data to become available in the server-side tables before setting the request's
future. The amount of time it will wait, retry, and the consistency level of the
tracing data can be controls by setting `CassCluster` configuration options.

So this is something we have to implement in cpp-rust-driver.

@muzarski muzarski added this to the 0.5 milestone Apr 13, 2025
@muzarski muzarski added the P1 P1 priority item - very important label Apr 13, 2025
@muzarski muzarski marked this pull request as draft April 30, 2025 13:57
@dkropachev dkropachev added P3 P3 item - no one uses this most probably, but in the end it's better to implement it and removed P1 P1 priority item - very important labels May 15, 2025
@muzarski muzarski removed this from the 0.5 milestone May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 P3 item - no one uses this most probably, but in the end it's better to implement it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants