Skip to content

Conversation

arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Cache for listobjects calls

Documentation entry for user-facing changes

@arthurpassos arthurpassos changed the title draft immpl Cache the list objects operation on object storage using a TTL + prefix matching cache implementation Apr 17, 2025
@arthurpassos
Copy link
Collaborator Author

arthur :) SELECT date, count()
FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
WHERE date between '2025-01-01' and '2025-01-31'
GROUP BY date ORDER BY date
SETTINGS use_hive_partitioning=1, use_object_storage_list_objects_cache=0;

SELECT
    date,
    count()
FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
WHERE (date >= '2025-01-01') AND (date <= '2025-01-31')
GROUP BY date
ORDER BY date ASC
SETTINGS use_hive_partitioning = 1, use_object_storage_list_objects_cache = 0

Query id: 29d096ab-0297-43a3-8844-b83b6a7856fb

    ┌─date───────┬─count()─┐
 1. │ 2025-01-01 │  292213 │
 2. │ 2025-01-02 │  402440 │
 3. │ 2025-01-03 │  409341 │
 4. │ 2025-01-04 │  432302 │
 5. │ 2025-01-05 │  433954 │
 6. │ 2025-01-06 │  366260 │
 7. │ 2025-01-07 │  352121 │
 8. │ 2025-01-08 │  399976 │
 9. │ 2025-01-09 │  534013 │
10. │ 2025-01-10 │  408769 │
11. │ 2025-01-11 │  361190 │
12. │ 2025-01-12 │  380525 │
13. │ 2025-01-13 │  408248 │
14. │ 2025-01-14 │  352684 │
15. │ 2025-01-15 │  354014 │
16. │ 2025-01-16 │  375439 │
17. │ 2025-01-17 │  425661 │
18. │ 2025-01-18 │  360666 │
19. │ 2025-01-19 │  388509 │
20. │ 2025-01-20 │  350291 │
21. │ 2025-01-21 │  324412 │
22. │ 2025-01-22 │  432369 │
23. │ 2025-01-23 │  326010 │
24. │ 2025-01-24 │  369243 │
25. │ 2025-01-25 │  338988 │
26. │ 2025-01-26 │  309651 │
27. │ 2025-01-27 │  332102 │
28. │ 2025-01-28 │  305953 │
29. │ 2025-01-29 │  355332 │
30. │ 2025-01-30 │  335134 │
31. │ 2025-01-31 │  328684 │
    └────────────┴─────────┘

31 rows in set. Elapsed: 4.080 sec. Processed 11.55 million rows, 0.00 B (2.83 million rows/s., 0.00 B/s.)
Peak memory usage: 3.17 MiB.
arthur :) SELECT date, count()
FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
WHERE date between '2025-01-01' and '2025-01-31'
GROUP BY date ORDER BY date
SETTINGS use_hive_partitioning=1, use_object_storage_list_objects_cache=1;

SELECT
    date,
    count()
FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
WHERE (date >= '2025-01-01') AND (date <= '2025-01-31')
GROUP BY date
ORDER BY date ASC
SETTINGS use_hive_partitioning = 1, use_object_storage_list_objects_cache = 1

Query id: 4afafbfb-eb93-4b96-8c4f-4a94f723805c

    ┌─date───────┬─count()─┐
 1. │ 2025-01-01 │  292213 │
 2. │ 2025-01-02 │  402440 │
 3. │ 2025-01-03 │  409341 │
 4. │ 2025-01-04 │  432302 │
 5. │ 2025-01-05 │  433954 │
 6. │ 2025-01-06 │  366260 │
 7. │ 2025-01-07 │  352121 │
 8. │ 2025-01-08 │  399976 │
 9. │ 2025-01-09 │  534013 │
10. │ 2025-01-10 │  408769 │
11. │ 2025-01-11 │  361190 │
12. │ 2025-01-12 │  380525 │
13. │ 2025-01-13 │  408248 │
14. │ 2025-01-14 │  352684 │
15. │ 2025-01-15 │  354014 │
16. │ 2025-01-16 │  375439 │
17. │ 2025-01-17 │  425661 │
18. │ 2025-01-18 │  360666 │
19. │ 2025-01-19 │  388509 │
20. │ 2025-01-20 │  350291 │
21. │ 2025-01-21 │  324412 │
22. │ 2025-01-22 │  432369 │
23. │ 2025-01-23 │  326010 │
24. │ 2025-01-24 │  369243 │
25. │ 2025-01-25 │  338988 │
26. │ 2025-01-26 │  309651 │
27. │ 2025-01-27 │  332102 │
28. │ 2025-01-28 │  305953 │
29. │ 2025-01-29 │  355332 │
30. │ 2025-01-30 │  335134 │
31. │ 2025-01-31 │  328684 │
    └────────────┴─────────┘

31 rows in set. Elapsed: 0.040 sec. Processed 11.55 million rows, 0.00 B (287.50 million rows/s., 0.00 B/s.)
Peak memory usage: 844.09 KiB.

arthur :) 

@arthurpassos
Copy link
Collaborator Author

laptop@arthur:~/work/altinity/list_objects_cache$ ./cmake-build-release/programs/clickhouse benchmark -i 10 --cumulative -q "SELECT date, count()
FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
WHERE date between '2025-01-01' and '2025-01-31'
GROUP BY date ORDER BY date
SETTINGS use_hive_partitioning=1, use_object_storage_list_objects_cache=0;"

Queries executed: 10.

localhost:9000, queries: 10, QPS: 0.389, RPS: 4487684.448, MiB/s: 0.000, result RPS: 12.049, result MiB/s: 0.000.

0%		2.363 sec.	
10%		2.379 sec.	
20%		2.382 sec.	
30%		2.391 sec.	
40%		2.402 sec.	
50%		2.410 sec.	
60%		2.410 sec.	
70%		2.451 sec.	
80%		2.458 sec.	
90%		3.159 sec.	
95%		3.212 sec.	
99%		3.212 sec.	
99.9%		3.212 sec.	
99.99%		3.212 sec.	

laptop@arthur:~/work/altinity/list_objects_cache$ ./cmake-build-release/programs/clickhouse benchmark -i 10 --cumulative -q "SELECT date, count()
FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
WHERE date between '2025-01-01' and '2025-01-31'
GROUP BY date ORDER BY date
SETTINGS use_hive_partitioning=1, use_object_storage_list_objects_cache=1;"
Loaded 1 queries.

Queries executed: 10.

localhost:9000, queries: 10, QPS: 33.280, RPS: 384262109.406, MiB/s: 0.000, result RPS: 1031.666, result MiB/s: 0.028.

0%		0.015 sec.	
10%		0.015 sec.	
20%		0.015 sec.	
30%		0.016 sec.	
40%		0.017 sec.	
50%		0.017 sec.	
60%		0.017 sec.	
70%		0.018 sec.	
80%		0.018 sec.	
90%		0.018 sec.	
95%		0.018 sec.	
99%		0.018 sec.	
99.9%		0.018 sec.	
99.99%		0.018 sec.	

:)

{
if (const auto it = cache.find(key); it != cache.end())
{
if (IsStaleFunction()(it->first))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is interesting. In case we find an exact match, but it has expired. Should we try to find a prefix match or simply update the entry?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there can be a more up-to-date prefix entry, so why not try to reuse it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is that this entry would cease to exist. It would never be cached again. And it would become a linear search forever.

Actually, not forever, if the more up-to-date prefix entry gets evicted and this query is performed again, it would re-appear.

But I think you are right.

{
throw Exception(
ErrorCodes::BAD_ARGUMENTS,
"Using glob iterator with path without globs is not allowed (used path: {})",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall not it be LOGICAL_ERROR ?
This looks like a branch of code where we cannot get normally (user does not select which iterator to use manually)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it should probably be LOGICAL_ERROR. But:

This is mostly a copy and paste from the existing GlobIterator.

I might refactor this to avoid duplication. For now, this is just a draft implementation.

Even if I refactor this, I would opt for keeping parity with existing code & upstream. This will make the review and merges with upsteam easier

@svb-alt svb-alt added the antalya-25.2.2 Planned for 25.2.2 release label Apr 18, 2025
@@ -6108,6 +6108,9 @@ Limit for hosts used for request in object storage cluster table functions - azu
Possible values:
- Positive integer.
- 0 — All hosts in cluster.
)", EXPERIMENTAL) \
DECLARE(Bool, use_object_storage_list_objects_cache, true, R"(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add it to the src/Core/SettingsChangesHistory.cpp

cache.setMaxCount(count);
}

void ObjectStorageListObjectsCache::setTTL(std::size_t ttl_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it in seconds/miliseconds/minutes/hours?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In seconds, will modify the argument name

@@ -435,6 +436,16 @@ BlockIO InterpreterSystemQuery::execute()
break;
#else
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "The server was compiled without the support for Parquet");
#endif
}
case Type::DROP_OBJECT_STORAGE_LIST_OBJECTS_CACHE:
Copy link
Member

@Enmk Enmk Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is caching works only on Parquet files or generally on any S3 ListObject requests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, copy and paste issues. Should be any :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@arthurpassos
Copy link
Collaborator Author

There's a fing problem because IObjectStorage::getName will return the same name for fing S3 and GCS. The ObjectStorageType enumeration also batches them together. The only thing I could find that differentiate those two is the Definition structs in src/TableFunctions/TableFunctionObjectStorage.h. That does not get passed down to IObjectStorage.

This means that if the same bucket exists both in GCS and AWS and the same access_key and secret_key (unlikely in common case, but could happen for empty) are used, there will be conflicts.

I need to fix this, but I am freaking pissed off right now.

@arthurpassos
Copy link
Collaborator Author

The only functional difference from this version to the last reviewed one is that IObjectStorage::getDescription is now part of the cache key.

The reason for that is that we want to avoid collisions between different storage providers in the remote scenario where a bucket with the same name that contains the same directories exist in multiple object storage providers (i.e, aws s3, gcs and etc).

IObjectStorage::getDescription is implemented for the following classes:

S3ObjectStorage (implements AWS S3, GCS and Minio) - returns S3::URI::endpoint, examples below:

Input - s3://aws-public-blockchainn/v1.0/btc/transactions/**.parquet
Return - https://s3.amazonaws.com

Input - http://aws-public-blockchainn.s3.amazonaws.com/v1.0/btc/transactions/date=2025-01*/*.parquet
Return - http://s3.amazonaws.com

Input - https://storage.googleapiss.com/clickhouse-public-datasets/nyc-taxi/trips_{0..1}.gz
Return - https://storage.googleapiss.com

AzureObjectStorage' - returns AzureObjectStorage::description`

Input:
'DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;'

Return- http://azurite1:10000/devstoreaccount1/

Input:
http://azurite1:10000/devstoreaccount1

Return- http://azurite1:10000/devstoreaccount1

Input:
'BlobEndpoint=https://clickhousedocstest.blob.core.windows.net/;SharedAccessSignature=sp=r&st=2025-01-29T14:58:11Z&se=2025-01-29T22:58:11Z&spr=https&sv=2022-11-02&sr=c&sig=Ac2U0xl4tm%2Fp7m55IilWl1yHwk%2FJG0Uk6rMVuOiD0eE%3D'

Return- https://clickhousedocstest.blob.core.windows.net?se=2025-01-29T22:58:11Z&sig=Ac2U0xl4tm%2Fp7m55IilWl1yHwk%2FJG0Uk6rMVuOiD0eE%3D&sp=r&spr=https&sr=c&st=2025-01-29T14:58:11Z&sv=2022-11-02

HDFSObjectStorage - returns url

WebObjectStorage - returns url

LocalObjectStorage - returns blockid device?

The ones without example I have not had the time to test, I am sorry. Need to work on presentation for tomorrow.

ianton-ru
ianton-ru previously approved these changes May 7, 2025
Copy link

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Enmk Enmk merged commit 60282da into antalya May 9, 2025
257 of 322 checks passed
arthurpassos pushed a commit that referenced this pull request May 23, 2025
Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation
arthurpassos pushed a commit that referenced this pull request May 23, 2025
Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation
arthurpassos pushed a commit that referenced this pull request May 23, 2025
Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation
arthurpassos pushed a commit that referenced this pull request May 23, 2025
Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation
@svb-alt svb-alt added antalya-25.6 port-antalya PRs to be ported to all new Antalya releases and removed antalya-25.6 labels Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
antalya antalya-25.2 antalya-25.2.2 Planned for 25.2.2 release port-antalya PRs to be ported to all new Antalya releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants