Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce column oriented segment into #17653

Merged
merged 65 commits into from
Apr 8, 2025

Conversation

SkyFan2002
Copy link
Member

@SkyFan2002 SkyFan2002 commented Mar 26, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

The column-oriented segment format is introduced in this PR, which is designed to be more compact in memory and storage. Projection is supported when segments are read, requiring only a small amount of metadata to be read and cached, which is beneficial for handling wide tables.

The new segment format can be used by specifying segment_format = 'column_oriented' when a table is created:

create table t(c int) segment_format = 'column_oriented';

For tables with the old segment format, you can use the alter statement to change their segment format to column-oriented. This operation is not costly, as it only rewrites all segments of the table to the column-oriented format and generates a new snapshot:

alter table t SET OPTIONS (segment_format = 'column_oriented');

This PR supports copy into/insert/select for tables with column-oriented segment format. compact/recluster/mutation/table_function will be supported in subsequent PRs.

Performance and Cache Effect Test

tga_week is a wide table with over 1000 columns. The warehouse with medium specifications was used to execute three queries continuously:

SELECT `account`, `alliance_id`, `gofrepay_delta_value`, `gofrepay_total`, `gofrepay_limit`, `#event_time` FROM tga_week WHERE `#account_id` = '44430115' AND "#event_name" = 'order_process' AND "$part_date" >= '2024-09-01' AND "$part_date" < '2024-09-08';

SELECT `account`, `alliance_id`, `gofrepay_delta_value`, `gofrepay_total`, `gofrepay_limit`, `#event_time` FROM tga_week WHERE `#account_id` = '59771289' AND "#event_name" = 'order_process' AND "$part_date" >= '2024-09-01' AND "$part_date" < '2024-09-08';

SELECT `account`, `alliance_id`, `gofrepay_delta_value`, `gofrepay_total`, `gofrepay_limit`, `#event_time` FROM tga_week WHERE `#account_id` = '60618710' AND "#event_name" = 'order_process' AND "$part_date" >= '2024-09-01' AND "$part_date" < '2024-09-08';

Cache Observation

After executing the three queries, the cache situation is as follows when using the column-oriented segment:

After Q1 completed, all 560 segments of the source table were cached in 2 nodes, each node's column-oriented segment cache occupied about 200MB.

Q2 and Q3 executed without cache misses.

---after Q1
cloudapp@(pr-17653-medium)/poc> select * from system.caches where name like '%col%';

SELECT * FROM system.caches WHERE (name LIKE '%col%')

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          node          │                    name                   │ num_items │    size   │  capacity  │  unit  │ access │   hit  │  miss  │
│         String         │                   String                  │   UInt64  │   UInt64  │   UInt64   │ String │ UInt64 │ UInt64 │ UInt64 │
├────────────────────────┼───────────────────────────────────────────┼───────────┼───────────┼────────────┼────────┼────────┼────────┼────────┤
│ lBBP4DJ2iIcWmQPcIcA2L3 │ memory_cache_column_oriented_segment_info │       2802258669721073741824 │ bytes  │    2800280 │
│ 6p9gDJuzXDIKaOG26q09l3 │ memory_cache_column_oriented_segment_info │       2802271651891073741824 │ bytes  │    2800280 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 rows read in 0.011 sec. Processed 20 rows, 2.82 KiB (1.82 thousand rows/s, 256.21 KiB/s)

---after Q2
cloudapp@(pr-17653-medium)/poc> select * from system.caches where name like '%col%';

SELECT * FROM system.caches WHERE (name LIKE '%col%')

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          node          │                    name                   │ num_items │    size   │  capacity  │  unit  │ access │   hit  │  miss  │
│         String         │                   String                  │   UInt64  │   UInt64  │   UInt64   │ String │ UInt64 │ UInt64 │ UInt64 │
├────────────────────────┼───────────────────────────────────────────┼───────────┼───────────┼────────────┼────────┼────────┼────────┼────────┤
│ lBBP4DJ2iIcWmQPcIcA2L3 │ memory_cache_column_oriented_segment_info │       2802258669721073741824 │ bytes  │    560280280 │
│ 6p9gDJuzXDIKaOG26q09l3 │ memory_cache_column_oriented_segment_info │       2802271651891073741824 │ bytes  │    560280280 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 rows read in 0.012 sec. Processed 20 rows, 2.82 KiB (1.67 thousand rows/s, 234.86 KiB/s)

---after Q3
cloudapp@(pr-17653-medium)/poc> select * from system.caches where name like '%col%';

SELECT * FROM system.caches WHERE (name LIKE '%col%')

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          node          │                    name                   │ num_items │    size   │  capacity  │  unit  │ access │   hit  │  miss  │
│         String         │                   String                  │   UInt64  │   UInt64  │   UInt64   │ String │ UInt64 │ UInt64 │ UInt64 │
├────────────────────────┼───────────────────────────────────────────┼───────────┼───────────┼────────────┼────────┼────────┼────────┼────────┤
│ lBBP4DJ2iIcWmQPcIcA2L3 │ memory_cache_column_oriented_segment_info │       2802258669721073741824 │ bytes  │    840560280 │
│ 6p9gDJuzXDIKaOG26q09l3 │ memory_cache_column_oriented_segment_info │       2802271651891073741824 │ bytes  │    840560280 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 rows read in 0.013 sec. Processed 20 rows, 2.82 KiB (1.54 thousand rows/s, 216.80 KiB/s)

When using the old version of the segment, the cache situation after executing the three queries is as follows:

After Q1 completed, each node cached only about 50 segments, occupying 1GB of cache capacity.

Q2 and Q3 did not hit the cache.

---after Q1
SELECT * FROM system.caches WHERE (name LIKE '%segment%')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          node          │                name               │ num_items │    size    │  capacity  │  unit  │ access │   hit  │  miss  │
│         String         │               String              │   UInt64  │   UInt64   │   UInt64   │ String │ UInt64 │ UInt64 │ UInt64 │
├────────────────────────┼───────────────────────────────────┼───────────┼────────────┼────────────┼────────┼────────┼────────┼────────┤
│ gP5jCLC1bntng2pPnXMWL4 │ memory_cache_compact_segment_info │        5610731977621073741824 │ bytes  │    2960296 │
│ RI2lLeAcloudapp@(main-medium)/poc> select * from system.caches where name like '%segment%';

---after Q2
SELECT * FROM system.caches WHERE (name LIKE '%segment%')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          node          │                name               │ num_items │    size    │  capacity  │  unit  │ access │   hit  │  miss  │
│         String         │               String              │   UInt64  │   UInt64   │   UInt64   │ String │ UInt64 │ UInt64 │ UInt64 │
├────────────────────────┼───────────────────────────────────┼───────────┼────────────┼────────────┼────────┼────────┼────────┼────────┤
│ RI2lLeA8M2wNHwCg8sWa1  │ memory_cache_compact_segment_info │        5010607390931073741824 │ bytes  │    5760576 │
│ gP5jCLC1bntng2pPnXMWL4 │ memory_cache_compact_segment_info │        5610736527281073741824 │ bytes  │    5760576 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 rows read in 0.012 sec. Processed 18 rows, 2.51 KiB (1.5 thousand rows/s, 209.39 KiB/s)8M2wNHwCg8sWa1  │ memory_cache_compact_segment_info │        5010599665301073741824 │ bytes  │    2960296 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 rows read in 0.011 sec. Processed 18 rows, 2.51 KiB (1.64 thousand rows/s, 228.43 KiB/s)

---after Q3
SELECT * FROM system.caches WHERE (name LIKE '%segment%')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          node          │                name               │ num_items │    size    │  capacity  │  unit  │ access │   hit  │  miss  │
│         String         │               String              │   UInt64  │   UInt64   │   UInt64   │ String │ UInt64 │ UInt64 │ UInt64 │
├────────────────────────┼───────────────────────────────────┼───────────┼────────────┼────────────┼────────┼────────┼────────┼────────┤
│ RI2lLeA8M2wNHwCg8sWa1  │ memory_cache_compact_segment_info │        5010609781491073741824 │ bytes  │    8560856 │
│ gP5jCLC1bntng2pPnXMWL4 │ memory_cache_compact_segment_info │        5510580293651073741824 │ bytes  │    8560856 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 rows read in 0.011 sec. Processed 18 rows, 2.51 KiB (1.64 thousand rows/s, 228.43 KiB/s)

Execution Time

old format new format
Q1 8.693s 9.130s
Q2 7.309s 0.391s
Q3 6.120s 0.369s

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@SkyFan2002 SkyFan2002 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Mar 29, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-17653-5516cbb-1743232372

note: this image tag is only available for internal use.

@SkyFan2002 SkyFan2002 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Mar 29, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-17653-618d55e-1743256724

note: this image tag is only available for internal use.

@SkyFan2002 SkyFan2002 marked this pull request as ready for review April 3, 2025 07:26
@SkyFan2002 SkyFan2002 requested review from zhyass, dqhl76, dantengsky and sundy-li and removed request for zhyass and dqhl76 April 3, 2025 07:26
@BohuTANG
Copy link
Member

BohuTANG commented Apr 3, 2025

After this PR, if I create a new table(or alter a old table), the new column_oriented segment_format will used?

@SkyFan2002
Copy link
Member Author

segment_format = 'column_oriented'

Yes, but by default, newly created tables still use the old format. The new segment format will only be used if segment_format = 'column_oriented' is explicitly specified. Moreover, the operations supported by the new format are still limited — only COPY INTO, INSERT, and QUERY are supported — so it is not yet ready for use in production environments.

@dantengsky
Copy link
Member

👍

LGTM, although the implementation of this new feature is not yet complete, it has outlined a clear boundary with the existing components. The "effects" observed during preliminary testings also align with expectations.

Left some comments as a memo for potential improvements(if necessary) for subsequent PRs .

As for the format of the new segment and the granularity of the cache, I feel we still need to refine these further. Let’s discuss this in detail offline.

@SkyFan2002 SkyFan2002 merged commit 4466521 into databendlabs:main Apr 8, 2025
146 of 148 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants