Skip to content

[ntuple] Make RNTupleJoinProcessor composable #18224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 29, 2025

Conversation

enirolf
Copy link
Contributor

@enirolf enirolf commented Apr 1, 2025

This PR adds the possibility to compose RNTupleJoinProcessors from existing processor objects. Similar functionality is already in place for the RNTupleChainProcessor (see #17393), so with these additions the RNTupleProcessor is (almost*) fully composable.

*One caveat: the case where an auxiliary processor in the join is a join itself is not yet properly handled. This will be handled in a follow-up PR.

@enirolf enirolf self-assigned this Apr 1, 2025
@enirolf enirolf force-pushed the ntuple-processor-join-composition branch from 6a984c6 to 9ec9623 Compare April 1, 2025 15:28
Copy link

github-actions bot commented Apr 1, 2025

Test Results

    18 files      18 suites   4d 9h 49m 34s ⏱️
 2 731 tests  2 731 ✅ 0 💤 0 ❌
47 722 runs  47 722 ✅ 0 💤 0 ❌

Results for commit affbe33.

♻️ This comment has been updated with latest results.

@enirolf enirolf added clean build Ask CI to do non-incremental build on PR and removed clean build Ask CI to do non-incremental build on PR labels Apr 3, 2025
@enirolf enirolf force-pushed the ntuple-processor-join-composition branch from 9ec9623 to d2ea410 Compare April 3, 2025 08:19
@enirolf enirolf marked this pull request as ready for review April 3, 2025 08:20
@enirolf enirolf requested a review from jblomer as a code owner April 3, 2025 08:20
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The four cleanup commits at the end are great! Some comments for consideration on the entry handling.

@pcanal
Copy link
Member

pcanal commented Apr 21, 2025

*One caveat: the case where an auxiliary processor in the join is a join itself is not yet properly handled. This will be handled in a follow-up PR.

How does it fail? (hopefully 'loudly' :) ).

enirolf added 3 commits April 23, 2025 09:57
This is needed to properly handle join tables for chains of RNTuples.
Becasue it only returns the first entry index it encounters, subsequent
entry mappings (and partitions) are skipped, giving a (potentially
significant) performance improvement when only one entry index (without
any further constraints) is required.
As a first approximation, use the default partition. This might be
changed as things get further optimized.
@enirolf enirolf force-pushed the ntuple-processor-join-composition branch from d2ea410 to d7e3e59 Compare April 23, 2025 08:11
@enirolf
Copy link
Contributor Author

enirolf commented Apr 23, 2025

*One caveat: the case where an auxiliary processor in the join is a join itself is not yet properly handled. This will be handled in a follow-up PR.

How does it fail? (hopefully 'loudly' :) ).

With an exception, is that loud enough :D? I've added a test for it as well:

TEST_F(RNTupleProcessorTest, JoinedJoinComposedAuxiliary)
{
auto primaryProc = RNTupleProcessor::Create({fNTupleNames[0], fFileNames[0]});
std::vector<std::unique_ptr<RNTupleProcessor>> auxProcsIntermediate;
auxProcsIntermediate.emplace_back(RNTupleProcessor::Create({fNTupleNames[2], fFileNames[2]}, "ntuple_aux2"));
std::vector<std::unique_ptr<RNTupleProcessor>> auxProcs;
auxProcs.emplace_back(RNTupleProcessor::CreateJoin(RNTupleProcessor::Create({fNTupleNames[1], fFileNames[1]}),
std::move(auxProcsIntermediate), {"i"}));
try {
auto proc = RNTupleProcessor::CreateJoin(std::move(primaryProc), std::move(auxProcs), {});
} catch (const ROOT::RException &err) {
EXPECT_THAT(err.what(), testing::HasSubstr("auxiliary RNTupleJoinProcessors are currently not supported"));
}
}

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Some minor comments for now

enirolf added 8 commits April 24, 2025 11:06
In turn also makes the whole `RNTupleProcessor` engine composable, i.e.,
it is now possible to create chains of joins, joins of chains, chains of
chains, etc.

In this initial implementation has the restriction that joins where
values that are missing in the auxiliary data set are unsupported and
will result in an exception. Proper support for these scenarios will be
added later.
With the refactored join processor and the addition of `REntry::Reset`,
this friendship is not necessary anymore.
It is only still called in `RNTupleSingleProcessor::Connect`, so the
implementation has been moved into this method.
An individual RNTuple is now fully contained in the
`RNTupleSingleProcessor`, removing the need to (re)connect fields.
This makes the `RFieldContext` class redudant.
It is not used by the `RNTupleChainProcessor` or the
`RNTupleJoinProcessor` anymore.
This was already implemented for the other processor subclasses, but not
yet for this one.
@enirolf enirolf force-pushed the ntuple-processor-join-composition branch from d7e3e59 to 4792c94 Compare April 24, 2025 09:10
@enirolf enirolf requested review from hahnjo and vepadulano April 24, 2025 09:11
Add a description of the behaviour of this method in case multiple entry
indexes exist.
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats for this last step towards making the RNTupleProcessor composable! Maybe consider simplifying the commit history before merging

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments!

@enirolf enirolf merged commit 53eab5a into root-project:master Apr 29, 2025
22 checks passed
@enirolf enirolf deleted the ntuple-processor-join-composition branch April 29, 2025 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants