-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce column oriented segment into #17653
Conversation
# Conflicts: # src/query/service/src/interpreters/common/table_option_validation.rs # src/query/service/src/interpreters/interpreter_table_create.rs # src/query/storages/fuse/src/statistics/mod.rs
Docker Image for PR
|
Docker Image for PR
|
After this PR, if I create a new table(or alter a old table), the new |
src/query/storages/fuse/src/operations/mutation/mutator/compact_task_builder.rs
Outdated
Show resolved
Hide resolved
Yes, but by default, newly created tables still use the old format. The new segment format will only be used if |
👍 LGTM, although the implementation of this new feature is not yet complete, it has outlined a clear boundary with the existing components. The "effects" observed during preliminary testings also align with expectations. Left some comments as a memo for potential improvements(if necessary) for subsequent PRs . As for the format of the new segment and the granularity of the cache, I feel we still need to refine these further. Let’s discuss this in detail offline. |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
The column-oriented segment format is introduced in this PR, which is designed to be more compact in memory and storage. Projection is supported when segments are read, requiring only a small amount of metadata to be read and cached, which is beneficial for handling wide tables.
The new segment format can be used by specifying
segment_format = 'column_oriented'
when a table is created:For tables with the old segment format, you can use the
alter
statement to change their segment format to column-oriented. This operation is not costly, as it only rewrites all segments of the table to the column-oriented format and generates a new snapshot:This PR supports
copy into/insert/select
for tables with column-oriented segment format.compact/recluster/mutation/table_function
will be supported in subsequent PRs.Performance and Cache Effect Test
tga_week is a wide table with over 1000 columns. The warehouse with medium specifications was used to execute three queries continuously:
Cache Observation
After executing the three queries, the cache situation is as follows when using the column-oriented segment:
After Q1 completed, all 560 segments of the source table were cached in 2 nodes, each node's column-oriented segment cache occupied about 200MB.
Q2 and Q3 executed without cache misses.
When using the old version of the segment, the cache situation after executing the three queries is as follows:
After Q1 completed, each node cached only about 50 segments, occupying 1GB of cache capacity.
Q2 and Q3 did not hit the cache.
Execution Time
Tests
Type of change
This change is