Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of metadata transaction retries #17808

Merged
merged 1 commit into from
Mar 17, 2025

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Mar 17, 2025

Description

Metadata transactions initiated by IndexerSQLMetadataStorageCoordinator and MetadataTaskStorage
typically hold the TaskLockbox.giant lock while the transaction is in progress.
If there is a transient failure, the transaction is retried up to 10 times which typically spans a period of over 5 minutes.

This eventually leads to a very late failure while causing the Overlord threads to be stuck due to the giant lock.

Changes

  • Reduce quiet retries to 2
  • Reduce max retries to 3. It is best to fail early and noisily. If a retry is needed, it should be done at a higher layer.
  • Include retry messages where possible

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cryptoe cryptoe merged commit 6eff0e7 into apache:master Mar 17, 2025
75 checks passed
@cryptoe
Copy link
Contributor

cryptoe commented Mar 17, 2025

Changes LGTM. Missed clicking the approve button.

@kfaraz
Copy link
Contributor Author

kfaraz commented Mar 17, 2025

Thanks for the review, @cryptoe !

@kfaraz kfaraz deleted the improve_retry_logging branch March 17, 2025 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants