Refactor S3 retries to be in alignment with the AWS SDK.#5747
Refactor S3 retries to be in alignment with the AWS SDK.#5747teo-tsirpanis wants to merge 7 commits intomainfrom
Conversation
|
LGTM but I'd like @ihnorton to confirm we're ok to do this right now. |
410ffd0 to
b6cbe16
Compare
b6cbe16 to
e57322f
Compare
ypatia
left a comment
There was a problem hiding this comment.
Let's also add a description as this is a behavior change that can have impact.
| const std::string Config::VFS_S3_CA_FILE = ""; | ||
| const std::string Config::VFS_S3_CA_PATH = ""; | ||
| const std::string Config::VFS_S3_CONNECT_TIMEOUT_MS = "10800"; | ||
| const std::string Config::VFS_S3_CONNECT_MAX_TRIES = "5"; |
There was a problem hiding this comment.
Why remove the default for max tries rather than setting it to 10?
There was a problem hiding this comment.
I removed it in order to give the AWS SDK the opportunity to determine the max tries through environment variables and profile settings.
| if (s3_params_.connect_max_tries_.has_value()) { | ||
| retry_strategy = Aws::Client::InitRetryStrategy( | ||
| s3_params_.connect_max_tries_.value(), s3_params_.retry_strategy_); | ||
| } else { | ||
| retry_strategy = Aws::Client::InitRetryStrategy(s3_params_.retry_strategy_); | ||
| } |
There was a problem hiding this comment.
| if (s3_params_.connect_max_tries_.has_value()) { | |
| retry_strategy = Aws::Client::InitRetryStrategy( | |
| s3_params_.connect_max_tries_.value(), s3_params_.retry_strategy_); | |
| } else { | |
| retry_strategy = Aws::Client::InitRetryStrategy(s3_params_.retry_strategy_); | |
| } | |
| retry_strategy = Aws::Client::InitRetryStrategy( | |
| s3_params_.connect_max_tries_.value_or(10), s3_params_.retry_strategy_); |
There was a problem hiding this comment.
We can't do that; the overload with the one parameter tries to retrieve the max retries from environment variables and profile configuration.
| , connect_scale_factor_(config.get<int64_t>( | ||
| "vfs.s3.connect_scale_factor", Config::must_find)) | ||
| , connect_max_tries_(config.get<int64_t>("vfs.s3.connect_max_tries")) | ||
| , has_connect_scale_factor_( |
tiledb/sm/filesystem/s3.cc
Outdated
|
|
||
| virtual long CalculateDelayBeforeNextRetry( | ||
| const Aws::Client::AWSError<Aws::Client::CoreErrors>& error, | ||
| long attemptedRetries) const { |
There was a problem hiding this comment.
override on the rest of the methods?
| } else if (param == "vfs.s3.connect_max_tries") { | ||
| RETURN_NOT_OK(utils::parse::convert(value, &vint64)); | ||
| } else if (param == "vfs.s3.connect_scale_factor") { | ||
| } else if (param == "vfs.s3.connect_max_tries" && !value.empty()) { |
There was a problem hiding this comment.
maybe we can add sanity check for the allowed values in retry_strategy as well?
There was a problem hiding this comment.
I prefer we don't, and pass the value to the AWS SDK as-is. It can handle invalid values.
| // SSOCredentialsProvider | ||
| "TooManyRequestsException"}, | ||
| s3_params_.connect_max_tries_); | ||
| s3_params_.connect_max_tries_.value_or(10)); |
There was a problem hiding this comment.
why not just rely on the parameter default?
There was a problem hiding this comment.
We can, but I thought of letting the config option influence retries in authentication requests as well.
|
Updated description. |
This PR updates the retry logic of the S3 VFS to be in alignment with the AWS SDK. We added a new config option to set the retry strategy (which defaults to standard), and update the default value of
vfs.s3.connect_max_triesto be empty, which lets the AWS SDK pick the value (from environment variables or profile configuration).Because the AWS SDK's retry strategies don't support setting a scale factor, the
vfs.s3.connect_scale_factorconfig option was deprecated and is no longer supported.TYPE: FEATURE
DESC: Added
vfs.s3.retry_strategythat allows customizing the AWS SDK retry strategy to use for requests to S3. Defaults tostandard.TYPE: BREAKING_BEHAVIOR
DESC: The
vfs.s3.connect_scale_factorconfig option is no longer supported.TYPE: BREAKING_BEHAVIOR
DESC: The
vfs.s3.connect_max_triesconfig option defaults to an empty string, which lets the AWS SDK determine the maximum number of retries.