Skip to content

Conversation

Enmk
Copy link
Member

@Enmk Enmk commented Sep 9, 2025

Amalgamation of multiple related PRs (in that order):

#933

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Lazy load metadata for metadata for DataLake. (#742 by @ianton-ru )

#938

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Parquet Metadata caching (#795 by @arthurpassos)

#931

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Cache for listobjects calls (#743 by @arthurpassos )

#1005

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Now clickhouse supports compressed metadata.json files for Iceberg. Fixes ClickHouse#70874. (ClickHouse#81451 by @arthurpassos)


closes: #938, #931, #1005, #933

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Enmk and others added 15 commits July 25, 2025 10:03
…oad_metadata' into ports/25.6/amalgamation_of_metadata_prs
…orts/25.6.5/795_parquet_metadata_caching' into ports/25.6/amalgamation_of_metadata_prs
…bjects_object_storage_cache' into ports/25.6/amalgamation_of_metadata_prs
…eberg' into ports/25.6/amalgamation_of_metadata_prs
Copy link

github-actions bot commented Sep 9, 2025

Workflow [PR], commit [e35c16e]

if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache())
{
auto & cache = ObjectStorageListObjectsCache::instance();
ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(false)};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, I'm not entirely sure if configuration->getRawPath().cutGlobs(false) is a right alternative to what used to be configuration->getPathWithoutGlobs(). @arthurpassos please take a look

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should provide true to supports_partial_prefix on Path::cutGlobs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is object storage, partial prefixes are supported. Consider the following path: root/key1=val1/year=202{1..9}.

We need root/key1=val1/year=202, but by providing false, we will get: root/key1=val1/

else
object_info->metadata = object_storage->getObjectMetadata(path);
}
object_info->loadMetadata(object_storage, query_settings.ignore_non_existent_file);
Copy link
Member Author

@Enmk Enmk Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original PRs it was just object_info->loadMetadata(object_storage), but signature was changed, used query_settings.ignore_non_existent_file to plug the hole. Again, not sure if that's the right thing to do (@arthurpassos please check, and other instances of calling object_info->loadMetadata too)

@Enmk
Copy link
Member Author

Enmk commented Sep 9, 2025

All stateless test failures seem to be caused by changes introduced in this PR.

03322_check_count_for_parquet_in_s3
03036_reading_s3_archives
02245_s3_support_read_nested_column
03377_object_storage_list_objects_cache
03363_hive_style_partition
03299_parquet_object_storage_metadata_cache
02496_storage_s3_profile_events
02495_s3_filter_by_file
02480_s3_support_wildcard
02302_s3_file_pruning

It looks like either there is no required data files, or tests failed to produce those (minio access problem?)

@arthurpassos Please take a look

@Enmk Enmk merged commit abb3c0b into antalya-25.6.5 Sep 10, 2025
132 of 135 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants