[ntuple] Add partioning to `RNTupleJoinTable` #17919

enirolf · 2025-03-07T16:32:00Z

With this addition, the RNTupleJoinTable can be split up into several mappings from join values to entry numbers, according to some (numeric) partition key. Using a partition key is optional; a default partition key is used when none has been specified by the user.

The RNTupleJoinTable now has an inner class, REntryMapping, which in practice now represents what the RNTupleJoinTable represented previously, i.e., a mapping from join field values to entry numbers:

entryMapping
|- {x_0, y_0} -> 42
|- {x_1, y_1} -> 99
|- {x_2, y_2} -> 12
|- ...

The join table itself now instead provides a mapping from partition keys to (a collection of) these entry mappings:

joinTable
|- 0
   |- entryMapping_0
      |- {x_0, y_0} -> 42
      |- ...
|- 4
   |- entryMapping_0
      |- {x_0, y_0} -> 99
      |- ...
   |- entryMapping_1
      |- ...
|- ...
|- kDefaultPartitionKey
   |- entryMapping_0
      |- {x_0, y_0} -> 12
      |- ...
   |- ...

The reason one partition can contain multiple entry mappings is twofold:

This is needed when only the default partition key is used (or in other words, no partitions are used
Less relevant for now, but we could foresee cases where the partition keys are based on some (meta-data) attribute that is shared across more than one page source.

The most immediate use case of the partitioning approach added in this PR, is that this way, the RNTupleJoinTable itself is not restricted to one page source anymore (this restriction is now in REntryMapping instead). This is useful for the integration into the RNTupleProcessor, where we want to be able to create joins of chains of RNTuples and have to deal with more than one page source (see also #17132).

As a side-effect, the state management of the join table and the notion of lazy building has changed. There is no single isBuilt state anymore, and the Add function eagerly builds the mapping for the provided page source and adds it to the join table. As such, the responsibility of deciding whether to eagerly or lazily build the join table is moved to the application using the join table (i.e. by strategically calling Add).

github-actions · 2025-03-07T18:49:51Z

Test Results

20 files 20 suites 5d 0h 3m 12s ⏱️
2 726 tests 2 725 ✅ 0 💤 1 ❌
52 591 runs 52 590 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 14daddb.

♻️ This comment has been updated with latest results.

vepadulano

Nice improvement! Some minor comments for now

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx

tree/ntuple/v7/test/ntuple_join_table.cxx

silverweed

A couple of minor comments

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx

hahnjo

I left many comments. The answer to some of them may well be that it's motivated by future changes. In that case please feel free to simply resolve the thread(s).

tree/ntuple/v7/src/RNTupleJoinTable.cxx

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx

tree/ntuple/v7/test/ntuple_join_table.cxx

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx

hahnjo

LGTM in principle, thanks for making all the changes. I have one final question about the semantics of GetPartitionedEntryIndexes, and what it should return in case there is no index in a partition.

tree/ntuple/v7/src/RNTupleJoinTable.cxx

With this addition, the join table can be split up into several mappings from join values to entry numbers, according to some (numeric) partition key. It has several use cases, but the immediate one is that with this approach, the join table is not restricted to one page source anymore. This is useful for the integration into the `RNTupleProcessor`, where we want to be able to create joins of chains of RNTuples and have to deal with more than one page source. As a side-effect, the state management of the join table and the notion of lazy building has changed. There is no single `isBuilt` state anymore, and the `Add` function eagerly builds the mapping for the provided page source and adds it to the join table. As such, the responsibility of deciding whether to eagerly or lazily build the join table is moved to the application using the join table (i.e. by strategically calling `Add`).

Since it is declared within `RNTupleJoinTable`, it is clear that this type belongs to RNTuple from the context.

vepadulano

Nice improvements, thanks!

enirolf added the in:RNTuple label Mar 7, 2025

enirolf requested review from hahnjo, pcanal, silverweed and vepadulano March 7, 2025 16:32

enirolf self-assigned this Mar 7, 2025

enirolf force-pushed the ntuple-join-table-partitions branch 2 times, most recently from 6834e5a to 7ebbde6 Compare March 7, 2025 16:41

enirolf force-pushed the ntuple-join-table-partitions branch from 7ebbde6 to 4cc1604 Compare March 10, 2025 14:05

vepadulano requested changes Mar 10, 2025

View reviewed changes

enirolf force-pushed the ntuple-join-table-partitions branch from 4cc1604 to 8e2228d Compare March 10, 2025 14:58

enirolf marked this pull request as ready for review March 10, 2025 15:47

enirolf requested a review from jblomer as a code owner March 10, 2025 15:47

enirolf requested a review from vepadulano March 10, 2025 15:48

silverweed reviewed Mar 11, 2025

View reviewed changes

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx Outdated Show resolved Hide resolved

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx Outdated Show resolved Hide resolved

enirolf force-pushed the ntuple-join-table-partitions branch from 8e2228d to cdcfb92 Compare March 11, 2025 14:06

enirolf requested a review from silverweed March 11, 2025 14:06

hahnjo reviewed Mar 17, 2025

View reviewed changes

enirolf force-pushed the ntuple-join-table-partitions branch from cdcfb92 to 130714e Compare March 18, 2025 14:21

hahnjo reviewed Mar 18, 2025

View reviewed changes

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx Outdated Show resolved Hide resolved

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx Outdated Show resolved Hide resolved

tree/ntuple/v7/inc/ROOT/RNTupleJoinTable.hxx Show resolved Hide resolved

enirolf force-pushed the ntuple-join-table-partitions branch 2 times, most recently from 1e7f395 to 3a3fa99 Compare March 19, 2025 10:26

enirolf requested a review from hahnjo March 19, 2025 13:23

hahnjo reviewed Mar 19, 2025

View reviewed changes

tree/ntuple/v7/src/RNTupleJoinTable.cxx Outdated Show resolved Hide resolved

enirolf added 4 commits March 19, 2025 15:59

[ntuple][NFC] Rename fieldNames to joinFieldNames

156f69e

[ntuple][NFC] Rename joinFieldValues to castJoinValues

d3e8131

[ntuple][NFC] Rename NTupleJoinValue_t to JoinValue_t

14daddb

Since it is declared within `RNTupleJoinTable`, it is clear that this type belongs to RNTuple from the context.

enirolf force-pushed the ntuple-join-table-partitions branch from 3a3fa99 to 14daddb Compare March 19, 2025 15:01

vepadulano approved these changes Mar 20, 2025

View reviewed changes

hahnjo approved these changes Mar 20, 2025

View reviewed changes

enirolf merged commit 9b0dc8b into root-project:master Mar 21, 2025
38 of 44 checks passed

enirolf deleted the ntuple-join-table-partitions branch March 21, 2025 15:05

[ntuple] Add partioning to RNTupleJoinTable #17919

[ntuple] Add partioning to RNTupleJoinTable #17919

Uh oh!

Conversation

enirolf commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

silverweed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[ntuple] Add partioning to `RNTupleJoinTable` #17919

[ntuple] Add partioning to `RNTupleJoinTable` #17919

enirolf commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading