forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 4
Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
arthurpassos
wants to merge
29
commits into
antalya
Choose a base branch
from
list_objects_cache
base: antalya
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
3dc33f3
draft immpl
arthurpassos 7e182c7
remove garbage
arthurpassos ef985c9
use starts_with
arthurpassos 43c1383
fix a few obvious bugs
arthurpassos 989cfe0
some other fixes
arthurpassos 157450d
another fix
arthurpassos d4af4ae
Merge branch 'antalya' into list_objects_cache
arthurpassos 727b64d
remove unused method
arthurpassos 68cbad7
some other fixes
arthurpassos 00f58b3
some other fixes
arthurpassos 67ccaf0
use steady_clock instead of system_clock
arthurpassos 2b37e0c
small fixes
arthurpassos 0d6e343
Dont return null in case of exact match with stale entry, perform lin…
arthurpassos 0f605c4
add some metrics
arthurpassos 3ed2349
implement cache clear command
arthurpassos f345b33
remove ifdef from parquet
arthurpassos 9242843
add stateless tests
arthurpassos 4e19b09
add unit tests
arthurpassos 7a6eaec
Merge branch 'antalya' into list_objects_cache
arthurpassos 8c4ea48
rename ttl argument and member variable
arthurpassos 74b980c
new settings history
arthurpassos 4f55a75
make the setting false by default so other tests are not affected
arthurpassos 303ee27
add ref file
arthurpassos e6b379e
remove cachedglobiterator in favor of expensive copy. I think I'll re…
arthurpassos d7b50f4
simplify things a bit
arthurpassos 6cfa510
add some more tests
arthurpassos be8c6a1
make cache return a copy instead of a pointer, we don't want modifica…
arthurpassos b60cb95
update ut
arthurpassos d91bf00
improve prefix matching by implementing search with time complexity o…
arthurpassos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,196 @@ | ||
#include <Storages/Cache/ObjectStorageListObjectsCache.h> | ||
#include <Common/TTLCachePolicy.h> | ||
#include <Common/ProfileEvents.h> | ||
#include <boost/functional/hash.hpp> | ||
|
||
namespace ProfileEvents | ||
{ | ||
extern const Event ObjectStorageListObjectsCacheHits; | ||
extern const Event ObjectStorageListObjectsCacheMisses; | ||
extern const Event ObjectStorageListObjectsCacheExactMatchHits; | ||
extern const Event ObjectStorageListObjectsCachePrefixMatchHits; | ||
} | ||
|
||
namespace DB | ||
{ | ||
|
||
template <typename Key, typename Mapped, typename HashFunction, typename WeightFunction, typename IsStaleFunction> | ||
class ObjectStorageListObjectsCachePolicy : public TTLCachePolicy<Key, Mapped, HashFunction, WeightFunction, IsStaleFunction> | ||
{ | ||
public: | ||
using BasePolicy = TTLCachePolicy<Key, Mapped, HashFunction, WeightFunction, IsStaleFunction>; | ||
using typename BasePolicy::MappedPtr; | ||
using typename BasePolicy::KeyMapped; | ||
using BasePolicy::cache; | ||
|
||
ObjectStorageListObjectsCachePolicy() | ||
: BasePolicy(std::make_unique<NoCachePolicyUserQuota>()) | ||
{ | ||
} | ||
|
||
std::optional<KeyMapped> getWithKey(const Key & key) override | ||
{ | ||
if (const auto it = cache.find(key); it != cache.end()) | ||
ianton-ru marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
if (!IsStaleFunction()(it->first)) | ||
{ | ||
return std::make_optional<KeyMapped>({it->first, it->second}); | ||
} | ||
// found a stale entry, remove it but don't return. We still want to perform the prefix matching search | ||
BasePolicy::remove(it->first); | ||
} | ||
|
||
if (const auto it = findBestMatchingPrefixAndRemoveExpiredEntries(key); it != cache.end()) | ||
{ | ||
return std::make_optional<KeyMapped>({it->first, it->second}); | ||
} | ||
|
||
return std::nullopt; | ||
} | ||
|
||
private: | ||
auto findBestMatchingPrefixAndRemoveExpiredEntries(Key key) | ||
{ | ||
while (!key.prefix.empty()) | ||
{ | ||
if (const auto it = cache.find(key); it != cache.end()) | ||
{ | ||
if (IsStaleFunction()(it->first)) | ||
{ | ||
BasePolicy::remove(it->first); | ||
} | ||
else | ||
{ | ||
return it; | ||
} | ||
} | ||
|
||
key.prefix.pop_back(); | ||
} | ||
|
||
return cache.end(); | ||
} | ||
}; | ||
|
||
ObjectStorageListObjectsCache::Key::Key( | ||
const String & bucket_, | ||
const String & prefix_, | ||
const std::chrono::steady_clock::time_point & expires_at_, | ||
std::optional<UUID> user_id_) | ||
: bucket(bucket_), prefix(prefix_), expires_at(expires_at_), user_id(user_id_) {} | ||
|
||
bool ObjectStorageListObjectsCache::Key::operator==(const Key & other) const | ||
{ | ||
return bucket == other.bucket && prefix == other.prefix; | ||
} | ||
|
||
size_t ObjectStorageListObjectsCache::KeyHasher::operator()(const Key & key) const | ||
{ | ||
std::size_t seed = 0; | ||
|
||
boost::hash_combine(seed, key.bucket); | ||
boost::hash_combine(seed, key.prefix); | ||
|
||
return seed; | ||
} | ||
|
||
bool ObjectStorageListObjectsCache::IsStale::operator()(const Key & key) const | ||
{ | ||
return key.expires_at < std::chrono::steady_clock::now(); | ||
} | ||
|
||
size_t ObjectStorageListObjectsCache::WeightFunction::operator()(const Value & value) const | ||
{ | ||
std::size_t weight = 0; | ||
|
||
for (const auto & object : value) | ||
{ | ||
weight += object->relative_path.capacity() + sizeof(ObjectMetadata); | ||
} | ||
|
||
return weight; | ||
} | ||
|
||
ObjectStorageListObjectsCache::ObjectStorageListObjectsCache() | ||
: cache(std::make_unique<ObjectStorageListObjectsCachePolicy<Key, Value, KeyHasher, WeightFunction, IsStale>>()) | ||
{ | ||
} | ||
|
||
void ObjectStorageListObjectsCache::set( | ||
const std::string & bucket, | ||
const std::string & prefix, | ||
const std::shared_ptr<Value> & value) | ||
{ | ||
const auto key = Key{bucket, prefix, std::chrono::steady_clock::now() + std::chrono::seconds(ttl_in_seconds)}; | ||
ianton-ru marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
cache.set(key, value); | ||
} | ||
|
||
void ObjectStorageListObjectsCache::clear() | ||
{ | ||
cache.clear(); | ||
} | ||
|
||
std::optional<ObjectStorageListObjectsCache::Value> ObjectStorageListObjectsCache::get(const String & bucket, const String & prefix, bool filter_by_prefix) | ||
{ | ||
const auto input_key = Key{bucket, prefix}; | ||
auto pair = cache.getWithKey(input_key); | ||
|
||
if (!pair) | ||
{ | ||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCacheMisses); | ||
return {}; | ||
} | ||
|
||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCacheHits); | ||
|
||
if (pair->key == input_key) | ||
{ | ||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCacheExactMatchHits); | ||
return *pair->mapped; | ||
} | ||
|
||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCachePrefixMatchHits); | ||
|
||
if (!filter_by_prefix) | ||
{ | ||
return *pair->mapped; | ||
} | ||
|
||
Value filtered_objects; | ||
|
||
filtered_objects.reserve(pair->mapped->size()); | ||
|
||
for (const auto & object : *pair->mapped) | ||
{ | ||
if (object->relative_path.starts_with(input_key.prefix)) | ||
ianton-ru marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
filtered_objects.push_back(object); | ||
} | ||
} | ||
|
||
return filtered_objects; | ||
} | ||
|
||
void ObjectStorageListObjectsCache::setMaxSizeInBytes(std::size_t size_in_bytes_) | ||
{ | ||
cache.setMaxSizeInBytes(size_in_bytes_); | ||
} | ||
|
||
void ObjectStorageListObjectsCache::setMaxCount(std::size_t count) | ||
{ | ||
cache.setMaxCount(count); | ||
} | ||
|
||
void ObjectStorageListObjectsCache::setTTL(std::size_t ttl_in_seconds_) | ||
{ | ||
ttl_in_seconds = ttl_in_seconds_; | ||
} | ||
|
||
ObjectStorageListObjectsCache & ObjectStorageListObjectsCache::instance() | ||
{ | ||
static ObjectStorageListObjectsCache instance; | ||
return instance; | ||
} | ||
|
||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is caching works only on Parquet files or generally on any S3 ListObject requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, copy and paste issues. Should be any :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done