-
Notifications
You must be signed in to change notification settings - Fork 470
Columnar in logging dataflows #30883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
91d67b6
to
1e7aee2
Compare
f6ce511
to
c63241e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Releasing a batch of comments.
23bfc00
to
bf1f9e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second batch of comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last batch of comments!
I noticed that the compute logging dataflow is the only one that doesn't send Column
s all the way through but switches to Vec
after row packing (actually uses Vec
for most of the demux output streams too). What's the reason for that?
pub mapping: Box<[(LirId, LirMetadata)]>, | ||
pub mapping: Vec<(LirId, LirMetadata)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary because of a fundamental limitation or just an impl
in columnar
that's missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a corresponding implementation in columnar. We need to think more about how to encode Box<T>
in columnar, but I don't want to block on that discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a Box<[T]>
more difficult to support in columnar than a Vec<T>
? Just out of curiosity, not wanting to block seems reasonable to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not, but it increases the amount of implementations we have to maintain. An idea was to have an implementation for Box<T>
and for [T]
, but that might be difficult due to Sized
requirements.
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Requires to specify the exchange function as a proper function instead of a closure due convince Rust of the correct lifetimes. Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
bf1f9e4
to
1eeb2c1
Compare
Signed-off-by: Moritz Hoffmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Moritz Hoffmann <[email protected]>
Thanks for the reviews! |
Convert logging dataflows to columnar
The aim of this PR is to convert some of the logging dataflows to use columnar data on dataflow edges, wherever it makes sense to do so. It introduces building blocks that we need to move columnar data across dataflow edges, and feed them into merge batchers to create arrangements from columnar data.
The PR is rather large, and it is best viewed file-by-file. The rough structure is:
consolidate_pact
function.consolidate_and_pack
to simplify adding new introspection sources.The goal of this PR is to show how we can use columnar data in Materialize as a replacement for vectors. It doesn't yet use any columnar data in rendering of LIR plans.
The PR doesn't touch the dataflow edges from the demux operator to calling
consolidate_and_pack
because the edges use aConsolidatingContainerBuilder
, which is more efficient for vector-based containers (and in fact, lacks an implementation for any other container).Part of MaterializeInc/database-issues#3748.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.