Skip to content

Columnar in logging dataflows #30883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 22, 2025
Merged

Conversation

antiguru
Copy link
Member

@antiguru antiguru commented Dec 20, 2024

Convert logging dataflows to columnar

The aim of this PR is to convert some of the logging dataflows to use columnar data on dataflow edges, wherever it makes sense to do so. It introduces building blocks that we need to move columnar data across dataflow edges, and feed them into merge batchers to create arrangements from columnar data.

The PR is rather large, and it is best viewed file-by-file. The rough structure is:

  • containers in timely-util to support columnar data on dataflow edges and as input to arrangements.
  • Each of the log dataflows separately.
  • Improvement of the consolidate_pact function.
  • Introduction of consolidate_and_pack to simplify adding new introspection sources.

The goal of this PR is to show how we can use columnar data in Materialize as a replacement for vectors. It doesn't yet use any columnar data in rendering of LIR plans.

The PR doesn't touch the dataflow edges from the demux operator to calling consolidate_and_pack because the edges use a ConsolidatingContainerBuilder, which is more efficient for vector-based containers (and in fact, lacks an implementation for any other container).

Part of MaterializeInc/database-issues#3748.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@antiguru antiguru force-pushed the converting_cb branch 4 times, most recently from 91d67b6 to 1e7aee2 Compare January 10, 2025 13:18
@antiguru antiguru force-pushed the converting_cb branch 7 times, most recently from f6ce511 to c63241e Compare January 20, 2025 09:09
@antiguru antiguru marked this pull request as ready for review January 20, 2025 09:11
@antiguru antiguru requested a review from a team as a code owner January 20, 2025 09:11
@antiguru antiguru changed the title WIP columnar in logging Columnar in logging dataflows Jan 20, 2025
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing a batch of comments.

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second batch of comments!

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last batch of comments!

I noticed that the compute logging dataflow is the only one that doesn't send Columns all the way through but switches to Vec after row packing (actually uses Vec for most of the demux output streams too). What's the reason for that?

Comment on lines -172 to +175
pub mapping: Box<[(LirId, LirMetadata)]>,
pub mapping: Vec<(LirId, LirMetadata)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary because of a fundamental limitation or just an impl in columnar that's missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a corresponding implementation in columnar. We need to think more about how to encode Box<T> in columnar, but I don't want to block on that discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a Box<[T]> more difficult to support in columnar than a Vec<T>? Just out of curiosity, not wanting to block seems reasonable to me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not, but it increases the amount of implementations we have to maintain. An idea was to have an implementation for Box<T> and for [T], but that might be difficult due to Sized requirements.

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Requires to specify the exchange function as a proper function instead of a
closure due convince Rust of the correct lifetimes.

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru
Copy link
Member Author

Thanks for the reviews!

@antiguru antiguru enabled auto-merge (squash) January 22, 2025 19:10
@antiguru antiguru disabled auto-merge January 22, 2025 19:10
@antiguru antiguru enabled auto-merge (squash) January 22, 2025 19:11
@antiguru antiguru merged commit a077232 into MaterializeInc:main Jan 22, 2025
79 checks passed
@antiguru antiguru deleted the converting_cb branch January 22, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants