-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[ntuple] Make RNTupleJoinProcessor
composable
#18224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Make RNTupleJoinProcessor
composable
#18224
Conversation
6a984c6
to
9ec9623
Compare
Test Results 18 files 18 suites 4d 9h 49m 34s ⏱️ Results for commit affbe33. ♻️ This comment has been updated with latest results. |
9ec9623
to
d2ea410
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The four cleanup commits at the end are great! Some comments for consideration on the entry handling.
How does it fail? (hopefully 'loudly' :) ). |
This is needed to properly handle join tables for chains of RNTuples.
Becasue it only returns the first entry index it encounters, subsequent entry mappings (and partitions) are skipped, giving a (potentially significant) performance improvement when only one entry index (without any further constraints) is required.
As a first approximation, use the default partition. This might be changed as things get further optimized.
d2ea410
to
d7e3e59
Compare
With an exception, is that loud enough :D? I've added a test for it as well: root/tree/ntuple/test/ntuple_processor.cxx Lines 375 to 391 in d7e3e59
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Some minor comments for now
In turn also makes the whole `RNTupleProcessor` engine composable, i.e., it is now possible to create chains of joins, joins of chains, chains of chains, etc. In this initial implementation has the restriction that joins where values that are missing in the auxiliary data set are unsupported and will result in an exception. Proper support for these scenarios will be added later.
With the refactored join processor and the addition of `REntry::Reset`, this friendship is not necessary anymore.
It is only still called in `RNTupleSingleProcessor::Connect`, so the implementation has been moved into this method.
An individual RNTuple is now fully contained in the `RNTupleSingleProcessor`, removing the need to (re)connect fields. This makes the `RFieldContext` class redudant.
It is not used by the `RNTupleChainProcessor` or the `RNTupleJoinProcessor` anymore.
This was already implemented for the other processor subclasses, but not yet for this one.
d7e3e59
to
4792c94
Compare
Add a description of the behaviour of this method in case multiple entry indexes exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats for this last step towards making the RNTupleProcessor composable! Maybe consider simplifying the commit history before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments!
This PR adds the possibility to compose
RNTupleJoinProcessor
s from existing processor objects. Similar functionality is already in place for theRNTupleChainProcessor
(see #17393), so with these additions theRNTupleProcessor
is (almost*) fully composable.*One caveat: the case where an auxiliary processor in the join is a join itself is not yet properly handled. This will be handled in a follow-up PR.