Skip to content

Batch: Process-parallel directory scan and initial file read#426

Open
mlange05 wants to merge 10 commits intomainfrom
naml-scheduler-parallel-for-real
Open

Batch: Process-parallel directory scan and initial file read#426
mlange05 wants to merge 10 commits intomainfrom
naml-scheduler-parallel-for-real

Conversation

@mlange05
Copy link
Collaborator

@mlange05 mlange05 commented Nov 4, 2024

Initial implementation of simple parallelism in the batch-processing scheduler. This PR refactors the two "trivially parallel" steps of the Scheduler initialisation (scanning the source directories and parsing the full Sourcefile into IR) using Python's builtin ProcessPoolExecutor. It also adds a dummy SerialExecutor implementation that exposes the same interface, but works serially on the original process, thus keeping existing functionality available.

In a little more detail:

  • Strictly separate Sourcefile creation from Item creation and remove get_or_create_file_item_from_path
  • Add an executor object to the Scheduler that dummies to the provided SerialExecutor if num_workers=0 is selected, and otherwise creates a ProcessPoolExecutor(max_workers=num_workers).
  • Invoke the Sourcefile.from_path on the executor by passing assembled frontend_args
  • Perform the full file parse (Sourcefile.make_complete) using the executor.map functionality over source objects and parser-args, before updating the returned copy of source on the according Item
  • Pickle-safety fixes for sym.Cast objects and Module AST objects
  • Increase log-level for Scheduler enrichment to INFO, as it can now become quite dominant during the setup phase

Performance

To test performance, I've mimicked the H24-dev Plan-generation (without explicitly provided header paths), but locally enabled full source parses in the plan step. When adding the new ProcessPoolExecutor but keeping the number of processors low, we can see a significant overhead of the process-pipe-and-serialisation mechanics, but increasing the number of process somewhat we can still get to a reasonable quality-of-life improvement.

Sequential, equivalent to previous:

$ loki-transform.py plan --mode idem --config arpifs/loki_physics.config -s arpifs/phys_ec/ -s surf/external/ -s surf/module --plan-file=../../my_plan.cmake --num-workers=0
[Loki] Creating CMake plan file from config: arpifs/loki_physics.config
[Loki::Scheduler] Scheduler:: Initial file parse in 14.63s
...
[Loki::Scheduler] Performed initial source scan in 21.46s
[Loki::Scheduler] Performed full source parse in 79.78s
[Loki::Scheduler] Enriched call tree in 0.53s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake

Sequential, but through ProcessPoolExecutor:

$ loki-transform.py plan --mode idem --config arpifs/loki_physics.config -s arpifs/phys_ec/ -s surf/external/ -s surf/module --plan-file=../../my_plan.cmake --num-workers=1
[Loki] Creating CMake plan file from config: arpifs/loki_physics.config
[Loki::Scheduler] Scheduler:: Initial file parse in 13.06s
...
[Loki::Scheduler] Performed initial source scan in 20.50s
[Loki::Scheduler] Performed full source parse in 204.73s
[Loki::Scheduler] Enriched call tree in 20.53s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake

And with 12 build processes:

(loki_env) [naml@ac6-102 ifs-source]$ loki-transform.py plan --mode idem --config arpifs/loki_physics.config -s arpifs/phys_ec/ -s surf/external/ -s surf/module --plan-file=../../my_plan.cmake --num-workers=12
[Loki] Creating CMake plan file from config: arpifs/loki_physics.config
[Loki::Scheduler] Scheduler:: Initial file parse in 2.11s
...
[Loki::Scheduler] Performed initial source scan in 9.57s
[Loki::Scheduler] Performed full source parse in 36.45s
[Loki::Scheduler] Enriched call tree in 20.57s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake

@mlange05 mlange05 requested a review from reuterbal November 4, 2024 14:20
@github-actions
Copy link

github-actions bot commented Nov 4, 2024

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/426/index.html

@mlange05 mlange05 force-pushed the naml-scheduler-parallel-for-real branch 3 times, most recently from 3109cf1 to 31aba1f Compare February 24, 2026 15:34
@mlange05 mlange05 force-pushed the naml-scheduler-parallel-for-real branch from 4eddd7c to 607afdf Compare February 24, 2026 19:24
@mlange05 mlange05 force-pushed the naml-scheduler-parallel-for-real branch from 607afdf to 617a234 Compare February 25, 2026 08:46
@mlange05 mlange05 force-pushed the naml-scheduler-parallel-for-real branch from 617a234 to bf53164 Compare February 25, 2026 09:51
@mlange05 mlange05 marked this pull request as ready for review February 25, 2026 12:04
@mlange05
Copy link
Collaborator Author

Ok, this has now been rebased over latest main and some of the performance worries observed before have now disappeared. An updated breakdown of the timings can be found below. I'm aware this is only the first step towards some further consolidation of parallel / queuing utilities in the loki package, but it's a functional start that already can bring some real-world benefit. I would appreciate some feedback on general design, so paging Dr. @reuterbal for full review.

# Sequential executor (everything on main process)
(loki_env) [naml@aa6-198 ifs-source]$ loki-transform.py plan --mode idem --config cmake/loki.config -s arpifs/ -s surf/ --plan-file=../../my_plan.cmake --num-workers=0
[Loki] Creating CMake plan file from config: cmake/loki.config
...
[Loki::Scheduler] Scheduler:: Initial file parse in 46.88s
[Loki::Scheduler] Performed initial source scan in 58.09s
[Loki-transform] Applying custom pipeline idem from config:
...
[Loki::Scheduler] Applied transformation <IdemTransformation> in 0.17s
[Loki::Scheduler] Applied transformation <ModuleWrapTransformation> in 0.19s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 33.60s
[Loki::Scheduler] Performed initial source scan in 39.81s
[Loki::Scheduler] Applied transformation <DependencyTransformation> in 0.22s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 34.12s
[Loki::Scheduler] Performed initial source scan in 39.50s
[Loki::Scheduler] Applied transformation <FileWriteTransformation> in 0.05s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake
[Loki::Scheduler] Applied transformation <CMakePlanTransformation> in 0.07s


# Single-process executor (pickling from main to worker process)
(loki_env) [naml@aa6-198 ifs-source]$ loki-transform.py plan --mode idem --config cmake/loki.config -s arpifs/ -s surf/ --plan-file=../../my_plan.cmake --num-workers=1
[Loki] Creating CMake plan file from config: cmake/loki.config
...
[Loki::Scheduler] Scheduler:: Initial file parse in 52.03s
[Loki::Scheduler] Performed initial source scan in 67.39s
[Loki-transform] Applying custom pipeline idem from config:
...
[Loki::Scheduler] Applied transformation <IdemTransformation> in 0.17s
[Loki::Scheduler] Applied transformation <ModuleWrapTransformation> in 0.19s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 37.94s
[Loki::Scheduler] Performed initial source scan in 43.49s
[Loki::Scheduler] Applied transformation <DependencyTransformation> in 0.23s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 38.62s
[Loki::Scheduler] Performed initial source scan in 44.18s
[Loki::Scheduler] Applied transformation <FileWriteTransformation> in 0.05s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake
[Loki::Scheduler] Applied transformation <CMakePlanTransformation> in 0.07s


# Multi-process executor (3 worker processes)
(loki_env) [naml@aa6-198 ifs-source]$ loki-transform.py plan --mode idem --config cmake/loki.config -s arpifs/ -s surf/ --plan-file=../../my_plan.cmake --num-workers=3
[Loki] Creating CMake plan file from config: cmake/loki.config
...
[Loki::Scheduler] Scheduler:: Initial file parse in 22.23s
[Loki::Scheduler] Performed initial source scan in 37.73s
[Loki-transform] Applying custom pipeline idem from config:
...
[Loki::Scheduler] Applied transformation <IdemTransformation> in 0.17s
[Loki::Scheduler] Applied transformation <ModuleWrapTransformation> in 0.19s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 23.44s
[Loki::Scheduler] Performed initial source scan in 28.90s
[Loki::Scheduler] Applied transformation <DependencyTransformation> in 0.22s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 22.28s
[Loki::Scheduler] Performed initial source scan in 27.81s
[Loki::Scheduler] Applied transformation <FileWriteTransformation> in 0.05s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake
[Loki::Scheduler] Applied transformation <CMakePlanTransformation> in 0.07s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant