-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Split primary and auxiliary models in RNTupleJoinProcessor::CreateJoin
#17936
base: master
Are you sure you want to change the base?
Conversation
Test Results 19 files 19 suites 5d 10h 30m 59s ⏱️ For more details on these failures, see this check. Results for commit 1965191. ♻️ This comment has been updated with latest results. |
std::vector<RNTupleOpenSpec> auxNTuples{{fNTupleNames[1], fFileNames[1]}, {fNTupleNames[2], fFileNames[2]}}; | ||
|
||
auto joinModel = RNTupleProcessor::CreateJoinModel(std::move(primaryModel), std::move(auxModels), auxNTuples); | ||
|
||
auto i = joinModel->GetDefaultEntry().GetPtr<int>("i"); | ||
auto x = joinModel->GetDefaultEntry().GetPtr<float>("x"); | ||
auto y = joinModel->GetDefaultEntry().GetPtr<std::vector<float>>(fNTupleNames[1] + ".y"); | ||
auto z = joinModel->GetDefaultEntry().GetPtr<float>(fNTupleNames[2] + ".z"); | ||
|
||
auto proc = RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, auxNTuples, {"i"}, std::move(joinModel)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface boils down to:
auto primaryModel = RNTupleModel::Create();
std::vector<std::unique_ptr<RNTupleModel>> auxModels;
... fill auxModels ...
std::vector<RNTupleOpenSpec> auxNTuples;
... fill auxNTuples ...
auto joinModel = RNTupleProcessor::CreateJoinModel(std::move(primaryModel), std::move(auxModels), auxNTuples);
auto proc = RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, auxNTuples, {"i"}, std::move(joinModel));
Feels a bit redundant and/or disjointed (pun not really intended). In particular in the last 2 lines auxNTuples
is repeated and (I assume) must be in the exact same state (ie. be the same) during both calls. What happens if you pass 2 completely different vector to both? Couldn't the CreateJoin
extract the auxNTuples
from the result of CreateJoinModel
(in which case the risk of wrong call would be eliminated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are valid points. I wanted to move away from the current interface where the model of the primary model is bunched together with the ones of the aux ntuples, to reflect also the splitting of the RNTupleOpenSpec
s themselves. So the alternative approach I thought of is to have the same pattern (i.e., primaryModel, {auxModels...}
) and then only internally call CreateModel
. I didn't go in this direction because it would the number of arguments to CreateJoin
even longer, but it does remove this redundancy and risk of user error, so maybe I should go with that after all..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Philippe that if we can reduce the redundancy on the user side we should go for it, even if it means making the function signature longer (maybe we should pass an Options
struct to it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this particular case using an Options
struct will be just as if not more cumbersome, but I will change the signature to take primaryModel, {auxModels...}
then 👍.
cc09ee6
to
1c01b4e
Compare
RNTupleJoinProcessor::Create
RNTupleJoinProcessor::CreateJoin
1c01b4e
to
af05f1e
Compare
435ae83
to
be24fe8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final comment when looking at the code again; we probably want to test a mixture of bare and non-bare models...
To reflect the interface changes introduced in rev d5b4931, instead of a single list of models (one per primary/auxiliary ntuple), a distinction is made between the models of the primary and auxiliary ntuples. Same as before, internally they are combined into one singular model used by the processor.
be24fe8
to
1965191
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM! (should have a second approval)
To reflect the interface changes introduced in #17964, instead of a single list of models (one per primary/auxiliary ntuple), a distinction is made between the models of the primary and auxiliary ntuples.
Same as before, internally they are combined into one singular model used by the processor. This model has to have the schemas of auxiliary RNTuples as untyped record fields, where the name of the record field is the name of the RNTuple.
A new addition is the added support for partial model specification in the list of auxiliary models. This means that users can pass a null pointer in the position of the corresponding ntuple, and then only for that ntuple, the processor will infer the model from the on-disk descriptor.
Helps move forward #17132.