Skip to content

Storage lock increase wait time and add artificial slow writes #2497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

ognyanstoimenov
Copy link
Collaborator

@ognyanstoimenov ognyanstoimenov commented Jul 24, 2025

Reference Issues/PRs

What does this implement or fix?

In continuation of #2359 which was reverted. This is the same but without the check for slow writes, as that was causing the lock to never be taken on inherently slow storages.

This is the diff with the reverted PR
symbol_list_slow_writes_tests...symbol_list_slow_writes_tests_after_revert

Changelog is similar to #2359:

  • Storage lock wait increased to 1000ms
  • Locking gives up on slow writes
  • StorageFailureSimulator extended to simulate slow writes
  • Stress tests (some are removed from Symbol list and storage lock improvements #2359)
  • Refactoring the storage lock share common code

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@ognyanstoimenov ognyanstoimenov added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. and removed no-release-notes This PR shouldn't be added to release notes. labels Jul 24, 2025
map_of_##LABEL[boost::to_upper_copy<std::string>(label)] = val; \
} \
\
TYPE get_##LABEL(const std::string& label, TYPE default_val) const { \
std::lock_guard<std::mutex> lock(mutex_); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this guard? We call these get_ methods all the time, often in fairly performance sensitive blocks (which perhaps we shouldn't), so I'm a bit concerned about the performance impact

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these locks shouldn't be needed. It could only be useful for tests which are setting and getting values from multiple threads. Such tests need better design.

Some general notes, not suggesting we do any of this:

The mutex lock without contention is quite cheap. but in this use case there could be contention from multiple readers whichmight explain the failing ASV benchmark. There exist single-writer multiple-reader primitives which can resolve this.

Still even just hashing the string every time and accessing the hashmap can be expensive but should again be on the order of tens of nanoseconds (especially if compiler is smart enough to hash the string compile time).
I do not think we really use the configsmap in settings where 100nanoseconds matter.

Copy link
Collaborator

@poodlewars poodlewars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this benchmark regression is real. Will be good to get to the bottom of it.

 | Change   | Before [91d18651]    | After [66842fe3]    |   Ratio | Benchmark (Parameter)                                                             |
|----------|----------------------|---------------------|---------|-----------------------------------------------------------------------------------|
| +        | 168±2ms              | 194±4ms             |    1.16 | basic_functions.BatchBasicFunctions.time_read_batch_with_date_ranges(25000, 500)  |
| +        | 188±2ms              | 218±5ms             |    1.16 | basic_functions.BatchBasicFunctions.time_read_batch_with_date_ranges(50000, 500)  |
| +        | 351±4ms              | 404±8ms             |    1.15 | basic_functions.BatchBasicFunctions.time_read_batch_with_date_ranges(25000, 1000) |

map_of_##LABEL[boost::to_upper_copy<std::string>(label)] = val; \
} \
\
TYPE get_##LABEL(const std::string& label, TYPE default_val) const { \
std::lock_guard<std::mutex> lock(mutex_); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these locks shouldn't be needed. It could only be useful for tests which are setting and getting values from multiple threads. Such tests need better design.

Some general notes, not suggesting we do any of this:

The mutex lock without contention is quite cheap. but in this use case there could be contention from multiple readers whichmight explain the failing ASV benchmark. There exist single-writer multiple-reader primitives which can resolve this.

Still even just hashing the string every time and accessing the hashmap can be expensive but should again be on the order of tens of nanoseconds (especially if compiler is smart enough to hash the string compile time).
I do not think we really use the configsmap in settings where 100nanoseconds matter.

(0.3, 10, 50),
(0.3, 100, 300),
(0.3, 300, 500),
(0.3, 700, 1200)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the current setup the last fixture parameter will bring enough slowness to introduce a genine test failure.

I do think this test is useful for when we eventually do something about the symbol list, so we shouldn't remove it but probably xfail it.

Also could you (after xfail-ing) run the real storage test to see this test indeed fails and will prove itself useful for future symbol list development?

@ognyanstoimenov
Copy link
Collaborator Author

The lock was indeed the problem for the asv degradation: with lock vs without lock
so I've removed it

My main concern is this relies on us being disciplined not to write anything to the configmap in multithreaded context and only read from it, while writing is done upfront

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants