Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Extend RNTupleOpenSpec to allow creating processors from a TDirectory #17934

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

enirolf
Copy link
Contributor

@enirolf enirolf commented Mar 10, 2025

This PR changes RNTupleOpenSpec to accept a TDirectory (or one of its descendants) instead of a plain file name, and create a RNTupleProcessor from there. This provides more flexibility for creating RNTupleProcessor objects. Among others, it enables processing RNTuples stored in subdirectories and TMemFiles.

One caveat is that the use of std::variant is not very ergonomic in Python. After discussing different solutions with @vepadulano, we concluded that the specification would/should not be used for anything beyond creating RNTupleProcessors, and that public access to its data members is not strictly required. It has therefore been changed to a write-only class, and the RNTupleProcessor creation factory arguments have been updated accordingly.

@enirolf enirolf self-assigned this Mar 10, 2025
@enirolf enirolf marked this pull request as ready for review March 10, 2025 11:30
@enirolf enirolf requested a review from jblomer as a code owner March 10, 2025 11:30
Copy link

github-actions bot commented Mar 10, 2025

Test Results

    18 files      18 suites   4d 10h 2m 41s ⏱️
 2 725 tests  2 724 ✅ 0 💤 1 ❌
47 351 runs  47 350 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 78c02dc.

♻️ This comment has been updated with latest results.

@enirolf enirolf force-pushed the ntuple-processor-tmemfile branch from 3bd92d7 to 8629d5b Compare March 10, 2025 13:22
@enirolf enirolf requested a review from couet as a code owner March 10, 2025 13:22
@@ -39,9 +39,11 @@ struct RNTupleProcessorEntryLoader;
/// Used to specify the underlying RNTuples in RNTupleProcessor
struct RNTupleOpenSpec {
std::string fNTupleName;
std::string fStorage;
std::variant<std::string, TDirectory *> fStorage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to discuss a bit the public interface of this. I am not sure it's going to be easily usable from Python, which might or might not be what we want.

@hahnjo
Copy link
Member

hahnjo commented Mar 11, 2025

As an alternative interface, maybe we can accept a RNTuple anchor directly (only for RNTupleSingleProcessor?) and require the user to compose that?

@enirolf
Copy link
Contributor Author

enirolf commented Mar 11, 2025

As an alternative interface, maybe we can accept a RNTuple anchor directly (only for RNTupleSingleProcessor?) and require the user to compose that?

I'm not very fond of that idea because 1) it would put more burden on the user and 2) I don't know to what extent that would make things nicer in Python. Of course when it turns out that this is a desired interface by users we can add it, but for now I would prefer not to extend the interface.

I discussed possible solutions with @vepadulano, and ultimately we settled on making RNTupleOpenSpec write-only (and making them pass-by-value in the factories). We anyways don't expect users do anything with the spec except create processors and this removes the possibility of having a not-easy-to-use-in-Python variant exposed. Again, only time will tell if this is actually what we want, but it at least keeps the current interface consistent.

enirolf added 4 commits March 11, 2025 14:18
... and its subclasses. The option to specificy the location of an
ntuple as a `TDirectory` instead of a plain file name provides more
flexibility for creating `RNTupleProcessor` objects. Among others, it
supports processing ntuples stored in subdirectories and `TMemFile`s.
The use of `std::variant` is not very ergonimic in Python. After
discussion different solutions, it was concluded that the specification
would/should not be used for anything beyond creating
`RNTupleProcessor`s, and that public access to its data members is not
strictly required.
@enirolf enirolf force-pushed the ntuple-processor-tmemfile branch from 8629d5b to 78c02dc Compare March 11, 2025 13:42
@enirolf enirolf changed the title [ntuple] Enable creating RNTupleProcessor from TDirectory [ntuple] Extend RNTupleOpenSpec to allow creating processors from a TDirectory Mar 11, 2025
@enirolf enirolf requested a review from vepadulano March 11, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants