Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 21, 2025

This PR implements the podio:input_collections parameter for JEventSourcePODIO, modeled after the existing podio:output_collections parameter in JEventProcessorPODIO. This feature allows filtering which collections are loaded from PODIO input files using both exact collection names and regex patterns.

Problem

Previously, JEventSourcePODIO would always load all available collections from input files. This was problematic when:

  • Processing real data that doesn't contain MC truth collections (like MCParticles)
  • Optimizing performance by loading only needed collections
  • Reducing memory usage for specific analysis tasks

Solution

Added a new podio:input_collections parameter that:

  • Accepts a comma-separated list of collection names and regex patterns
  • Supports regex patterns like EcalBarrel.* to match multiple collections efficiently
  • If not set, loads all collections (preserves backward compatibility)
  • If set, only loads collections matching the specified names/patterns

Usage Examples

# Load all collections (default behavior)
eicrecon input.edm4hep.root

# Load specific collections by exact name
eicrecon -Ppodio:input_collections=EventHeader,EcalBarrelScFiRawHits input.edm4hep.root

# Load collections using regex patterns (more convenient)
eicrecon -Ppodio:input_collections=EventHeader,EcalBarrel.*,HcalBarrel.*,SiBarrel.* input.edm4hep.root

# Load only RawHits collections for reconstruction-only pipeline
eicrecon -Ppodio:input_collections=.*RawHits input_with_rawhits.edm4eic.root

Implementation Details

  • Added ResolveInputCollections() method that converts regex patterns to actual collection names
  • Implemented efficient O(log n) collection filtering using resolved collection sets
  • Added thread-safe pattern resolution using std::call_once for first-event processing
  • Added informative logging to show filtering behavior
  • Maintains full backward compatibility when parameter is not set

Testing

Updated eicrecon-input-collections-test job in GitHub Actions to test a two-stage processing pipeline:

  1. Stage 1: Run eicrecon to create all output collections ending with RawHits from simulation input
  2. Stage 2: Run eicrecon using only RawHits collections as input (testing the input filtering feature) and create all default output collections
  3. Verification: Use podio-dump --category events to list the final output collections

This testing approach better simulates real data processing workflows where only detector hits are available as input, demonstrating the practical value of the input filtering feature for reconstruction-only pipelines.

This addresses the need for real data processing workflows where MC truth information is not available, while providing a user-friendly regex interface for specifying collection patterns instead of exhaustive lists.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@Copilot Copilot AI changed the title [WIP] Add a parameter podio:input_collections to the JEventSourcePODIO factory in src/services/io/podio that is modeled on the behavior of podio:output_collections in the JEventProcessorPODIO factory, and which adds only those enabled collections to the ... Add podio:input_collections parameter to filter collections from input files Aug 21, 2025
Copilot finished work on behalf of wdconinc August 21, 2025 17:14
@Copilot Copilot AI requested a review from wdconinc August 21, 2025 17:14
@wdconinc wdconinc added the pre-commit.ci autofix Enable pre-commit.ci autofixes even for bot accounts label Aug 21, 2025
@pre-commit-ci pre-commit-ci bot removed the pre-commit.ci autofix Enable pre-commit.ci autofixes even for bot accounts label Aug 21, 2025
@Copilot Copilot AI requested a review from wdconinc August 21, 2025 17:57
Copilot finished work on behalf of wdconinc August 21, 2025 17:57
@Copilot Copilot AI changed the title Add podio:input_collections parameter to filter collections from input files Add podio:input_collections parameter with regex support to filter collections from input files Aug 21, 2025
Copilot finished work on behalf of wdconinc August 21, 2025 18:03
@wdconinc wdconinc added the pre-commit.ci autofix Enable pre-commit.ci autofixes even for bot accounts label Aug 21, 2025
@pre-commit-ci pre-commit-ci bot removed the pre-commit.ci autofix Enable pre-commit.ci autofixes even for bot accounts label Aug 21, 2025
…llections from input files (fix: iwyu) (#2029)

This PR applies the include-what-you-use fixes as suggested by
https://github.com/eic/EICrecon/actions/runs/17135243469.
Please merge this PR into the branch
`copilot/fix-dfbd7245-eb25-4079-86b3-071bd236aed8`
to resolve failures in PR #2026.

Auto-generated by [create-pull-request][1]

[1]: https://github.com/peter-evans/create-pull-request

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@wdconinc
Copy link
Contributor

@copilot I merged the include-what-you-use fixes.

Copilot finished work on behalf of wdconinc August 21, 2025 18:44
@wdconinc
Copy link
Contributor

@copilot Please add to the eicrecon-input-collections-test check a command (e.g. podio-dump --category events) that lists the output collections that were able to be created from the limited set of input collections.

Copy link
Contributor Author

Copilot AI commented Aug 21, 2025

@copilot Please add to the eicrecon-input-collections-test check a command (e.g. podio-dump --category events) that lists the output collections that were able to be created from the limited set of ...

Added podio-dump --category events command to the eicrecon-input-collections-test check in commit d6a847b. This command will list the output collections that were created from the limited set of input collections, providing visibility into what collections are successfully processed when using the input filtering feature.

Copilot finished work on behalf of wdconinc August 21, 2025 18:49
Copy link
Contributor

@wdconinc wdconinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run over the entire input file in eicrecon-input-collections-test.

Copy link
Contributor

@wdconinc wdconinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't limit output collections.

@wdconinc
Copy link
Contributor

@copilot Let's modify the eicrecon-input-collections-test to do the following instead (and keep it as a new job, with the same name):

  1. Run eicrecon to create all output collections that end with RawHits, similar to eicrecon-two-stage-running.
  2. Run eicrecon in a next step that uses only the input collections that end with RawHits and creates all output default collections.
  3. Keep the podio-dump summary of which output was written.

Copy link
Contributor Author

Copilot AI commented Aug 21, 2025

@copilot Let's modify the eicrecon-input-collections-test to do the following instead (and keep it as a new job, with the same name):

  1. Run eicrecon to create all output collections that end with Ra...

Restructured the eicrecon-input-collections-test to follow the two-stage pattern as requested in commit 59dd624:

  1. Stage 1: Creates all output collections ending with RawHits using -Ppodio:output_collections=.*RawHits
  2. Stage 2: Uses only RawHits collections as input with -Ppodio:input_collections=.*RawHits and creates all default output collections
  3. Summary: Kept the podio-dump --category events command to list final output collections

This better simulates real data processing pipelines where only detector hits are available as input, similar to the eicrecon-two-stage-running workflow.

@Copilot Copilot AI requested a review from wdconinc August 21, 2025 23:24
Copilot finished work on behalf of wdconinc August 21, 2025 23:24
@Copilot Copilot AI temporarily deployed to github-pages August 22, 2025 13:31 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants