forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 7
Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
3dc33f3
draft immpl
arthurpassos 7e182c7
remove garbage
arthurpassos ef985c9
use starts_with
arthurpassos 43c1383
fix a few obvious bugs
arthurpassos 989cfe0
some other fixes
arthurpassos 157450d
another fix
arthurpassos d4af4ae
Merge branch 'antalya' into list_objects_cache
arthurpassos 727b64d
remove unused method
arthurpassos 68cbad7
some other fixes
arthurpassos 00f58b3
some other fixes
arthurpassos 67ccaf0
use steady_clock instead of system_clock
arthurpassos 2b37e0c
small fixes
arthurpassos 0d6e343
Dont return null in case of exact match with stale entry, perform lin…
arthurpassos 0f605c4
add some metrics
arthurpassos 3ed2349
implement cache clear command
arthurpassos f345b33
remove ifdef from parquet
arthurpassos 9242843
add stateless tests
arthurpassos 4e19b09
add unit tests
arthurpassos 7a6eaec
Merge branch 'antalya' into list_objects_cache
arthurpassos 8c4ea48
rename ttl argument and member variable
arthurpassos 74b980c
new settings history
arthurpassos 4f55a75
make the setting false by default so other tests are not affected
arthurpassos 303ee27
add ref file
arthurpassos e6b379e
remove cachedglobiterator in favor of expensive copy. I think I'll re…
arthurpassos d7b50f4
simplify things a bit
arthurpassos 6cfa510
add some more tests
arthurpassos be8c6a1
make cache return a copy instead of a pointer, we don't want modifica…
arthurpassos b60cb95
update ut
arthurpassos d91bf00
improve prefix matching by implementing search with time complexity o…
arthurpassos c6e53a1
Merge branch 'antalya' into list_objects_cache
arthurpassos f1c3591
Merge branch 'antalya' into list_objects_cache
arthurpassos 14973d2
Merge branch 'antalya' into list_objects_cache
arthurpassos 55ac0bc
Merge branch 'antalya' into list_objects_cache
arthurpassos e0e19a2
idraft impl of authorization aware cache
arthurpassos 45af8a5
rename setting
arthurpassos 7266d92
delete copy/move constructors and assignment operators
arthurpassos 8e78b28
azure impl and fix aws
arthurpassos 2ed102d
docs
arthurpassos 28bfcfb
fix tests
arthurpassos aab089c
Merge branch 'antalya' into list_objects_cache
arthurpassos dd5934e
incorporate comments on tests
arthurpassos 0f5057e
remove unused code
arthurpassos 27c4dea
Revert "remove unused code"
arthurpassos d789d1e
Revert "incorporate comments on tests"
arthurpassos fef71c0
Revert "docs"
arthurpassos 6bfcb86
Revert "azure impl and fix aws"
arthurpassos f68725a
Revert "idraft impl of authorization aware cache"
arthurpassos 7597da0
Merge branch 'antalya' into list_objects_cache
arthurpassos e7940af
Merge branch 'antalya' into list_objects_cache
arthurpassos f863a6e
use description as part of the key
arthurpassos cbfe36d
Reapply "incorporate comments on tests"
arthurpassos 9092aba
remove unused code
arthurpassos 057b0b5
add supportsListObjectsCache to enable this cache for s3-like (minio,…
arthurpassos 49748c9
increase default cache size
arthurpassos 96cf2d2
fix weight funciton
arthurpassos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
#include <Storages/Cache/ObjectStorageListObjectsCache.h> | ||
#include <Common/TTLCachePolicy.h> | ||
#include <Common/ProfileEvents.h> | ||
#include <boost/functional/hash.hpp> | ||
|
||
namespace ProfileEvents | ||
{ | ||
extern const Event ObjectStorageListObjectsCacheHits; | ||
extern const Event ObjectStorageListObjectsCacheMisses; | ||
extern const Event ObjectStorageListObjectsCacheExactMatchHits; | ||
extern const Event ObjectStorageListObjectsCachePrefixMatchHits; | ||
} | ||
|
||
namespace DB | ||
{ | ||
|
||
template <typename Key, typename Mapped, typename HashFunction, typename WeightFunction, typename IsStaleFunction> | ||
class ObjectStorageListObjectsCachePolicy : public TTLCachePolicy<Key, Mapped, HashFunction, WeightFunction, IsStaleFunction> | ||
{ | ||
public: | ||
using BasePolicy = TTLCachePolicy<Key, Mapped, HashFunction, WeightFunction, IsStaleFunction>; | ||
using typename BasePolicy::MappedPtr; | ||
using typename BasePolicy::KeyMapped; | ||
using BasePolicy::cache; | ||
|
||
ObjectStorageListObjectsCachePolicy() | ||
: BasePolicy(std::make_unique<NoCachePolicyUserQuota>()) | ||
{ | ||
} | ||
|
||
std::optional<KeyMapped> getWithKey(const Key & key) override | ||
{ | ||
if (const auto it = cache.find(key); it != cache.end()) | ||
ianton-ru marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
if (!IsStaleFunction()(it->first)) | ||
{ | ||
return std::make_optional<KeyMapped>({it->first, it->second}); | ||
} | ||
// found a stale entry, remove it but don't return. We still want to perform the prefix matching search | ||
BasePolicy::remove(it->first); | ||
} | ||
|
||
if (const auto it = findBestMatchingPrefixAndRemoveExpiredEntries(key); it != cache.end()) | ||
{ | ||
return std::make_optional<KeyMapped>({it->first, it->second}); | ||
} | ||
|
||
return std::nullopt; | ||
} | ||
|
||
private: | ||
auto findBestMatchingPrefixAndRemoveExpiredEntries(Key key) | ||
{ | ||
while (!key.prefix.empty()) | ||
{ | ||
if (const auto it = cache.find(key); it != cache.end()) | ||
{ | ||
if (IsStaleFunction()(it->first)) | ||
{ | ||
BasePolicy::remove(it->first); | ||
} | ||
else | ||
{ | ||
return it; | ||
} | ||
} | ||
|
||
key.prefix.pop_back(); | ||
} | ||
|
||
return cache.end(); | ||
} | ||
}; | ||
|
||
ObjectStorageListObjectsCache::Key::Key( | ||
const String & storage_description_, | ||
const String & bucket_, | ||
const String & prefix_, | ||
const std::chrono::steady_clock::time_point & expires_at_, | ||
std::optional<UUID> user_id_) | ||
: storage_description(storage_description_), bucket(bucket_), prefix(prefix_), expires_at(expires_at_), user_id(user_id_) {} | ||
|
||
bool ObjectStorageListObjectsCache::Key::operator==(const Key & other) const | ||
{ | ||
return storage_description == other.storage_description && bucket == other.bucket && prefix == other.prefix; | ||
} | ||
|
||
size_t ObjectStorageListObjectsCache::KeyHasher::operator()(const Key & key) const | ||
{ | ||
std::size_t seed = 0; | ||
|
||
boost::hash_combine(seed, key.storage_description); | ||
boost::hash_combine(seed, key.bucket); | ||
boost::hash_combine(seed, key.prefix); | ||
|
||
return seed; | ||
} | ||
|
||
bool ObjectStorageListObjectsCache::IsStale::operator()(const Key & key) const | ||
{ | ||
return key.expires_at < std::chrono::steady_clock::now(); | ||
} | ||
|
||
size_t ObjectStorageListObjectsCache::WeightFunction::operator()(const Value & value) const | ||
{ | ||
std::size_t weight = 0; | ||
|
||
for (const auto & object : value) | ||
{ | ||
const auto object_metadata = object->metadata; | ||
weight += object->relative_path.capacity() + sizeof(object_metadata); | ||
|
||
// variable size | ||
if (object_metadata) | ||
{ | ||
weight += object_metadata->etag.capacity(); | ||
weight += object_metadata->attributes.size() * (sizeof(std::string) * 2); | ||
|
||
for (const auto & [k, v] : object_metadata->attributes) | ||
{ | ||
weight += k.capacity() + v.capacity(); | ||
} | ||
} | ||
} | ||
|
||
return weight; | ||
} | ||
|
||
ObjectStorageListObjectsCache::ObjectStorageListObjectsCache() | ||
: cache(std::make_unique<ObjectStorageListObjectsCachePolicy<Key, Value, KeyHasher, WeightFunction, IsStale>>()) | ||
{ | ||
} | ||
|
||
void ObjectStorageListObjectsCache::set( | ||
const Key & key, | ||
const std::shared_ptr<Value> & value) | ||
{ | ||
auto key_with_ttl = key; | ||
key_with_ttl.expires_at = std::chrono::steady_clock::now() + std::chrono::seconds(ttl_in_seconds); | ||
|
||
cache.set(key_with_ttl, value); | ||
} | ||
|
||
void ObjectStorageListObjectsCache::clear() | ||
{ | ||
cache.clear(); | ||
} | ||
|
||
std::optional<ObjectStorageListObjectsCache::Value> ObjectStorageListObjectsCache::get(const Key & key, bool filter_by_prefix) | ||
{ | ||
const auto pair = cache.getWithKey(key); | ||
|
||
if (!pair) | ||
{ | ||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCacheMisses); | ||
return {}; | ||
} | ||
|
||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCacheHits); | ||
|
||
if (pair->key == key) | ||
{ | ||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCacheExactMatchHits); | ||
return *pair->mapped; | ||
} | ||
|
||
ProfileEvents::increment(ProfileEvents::ObjectStorageListObjectsCachePrefixMatchHits); | ||
|
||
if (!filter_by_prefix) | ||
{ | ||
return *pair->mapped; | ||
} | ||
|
||
Value filtered_objects; | ||
|
||
filtered_objects.reserve(pair->mapped->size()); | ||
|
||
for (const auto & object : *pair->mapped) | ||
{ | ||
if (object->relative_path.starts_with(key.prefix)) | ||
{ | ||
filtered_objects.push_back(object); | ||
} | ||
} | ||
|
||
return filtered_objects; | ||
} | ||
|
||
void ObjectStorageListObjectsCache::setMaxSizeInBytes(std::size_t size_in_bytes_) | ||
{ | ||
cache.setMaxSizeInBytes(size_in_bytes_); | ||
} | ||
|
||
void ObjectStorageListObjectsCache::setMaxCount(std::size_t count) | ||
{ | ||
cache.setMaxCount(count); | ||
} | ||
|
||
void ObjectStorageListObjectsCache::setTTL(std::size_t ttl_in_seconds_) | ||
{ | ||
ttl_in_seconds = ttl_in_seconds_; | ||
} | ||
|
||
ObjectStorageListObjectsCache & ObjectStorageListObjectsCache::instance() | ||
{ | ||
static ObjectStorageListObjectsCache instance; | ||
return instance; | ||
} | ||
|
||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is caching works only on Parquet files or generally on any S3 ListObject requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, copy and paste issues. Should be any :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done