Batch: Process-parallel directory scan and initial file read by mlange05 · Pull Request #426 · ecmwf-ifs/loki

mlange05 · 2024-11-04T14:20:17Z

Initial implementation of simple parallelism in the batch-processing scheduler. This PR refactors the two "trivially parallel" steps of the Scheduler initialisation (scanning the source directories and parsing the full Sourcefile into IR) using Python's builtin ProcessPoolExecutor. It also adds a dummy SerialExecutor implementation that exposes the same interface, but works serially on the original process, thus keeping existing functionality available.

In a little more detail:

Strictly separate Sourcefile creation from Item creation and remove get_or_create_file_item_from_path
Add an executor object to the Scheduler that dummies to the provided SerialExecutor if num_workers=0 is selected, and otherwise creates a ProcessPoolExecutor(max_workers=num_workers).
Invoke the Sourcefile.from_path on the executor by passing assembled frontend_args
Perform the full file parse (Sourcefile.make_complete) using the executor.map functionality over source objects and parser-args, before updating the returned copy of source on the according Item
Pickle-safety fixes for sym.Cast objects and Module AST objects
Increase log-level for Scheduler enrichment to INFO, as it can now become quite dominant during the setup phase

Performance

To test performance, I've mimicked the H24-dev Plan-generation (without explicitly provided header paths), but locally enabled full source parses in the plan step. When adding the new ProcessPoolExecutor but keeping the number of processors low, we can see a significant overhead of the process-pipe-and-serialisation mechanics, but increasing the number of process somewhat we can still get to a reasonable quality-of-life improvement.

Sequential, equivalent to previous:

$ loki-transform.py plan --mode idem --config arpifs/loki_physics.config -s arpifs/phys_ec/ -s surf/external/ -s surf/module --plan-file=../../my_plan.cmake --num-workers=0
[Loki] Creating CMake plan file from config: arpifs/loki_physics.config
[Loki::Scheduler] Scheduler:: Initial file parse in 14.63s
...
[Loki::Scheduler] Performed initial source scan in 21.46s
[Loki::Scheduler] Performed full source parse in 79.78s
[Loki::Scheduler] Enriched call tree in 0.53s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake

Sequential, but through ProcessPoolExecutor:

$ loki-transform.py plan --mode idem --config arpifs/loki_physics.config -s arpifs/phys_ec/ -s surf/external/ -s surf/module --plan-file=../../my_plan.cmake --num-workers=1
[Loki] Creating CMake plan file from config: arpifs/loki_physics.config
[Loki::Scheduler] Scheduler:: Initial file parse in 13.06s
...
[Loki::Scheduler] Performed initial source scan in 20.50s
[Loki::Scheduler] Performed full source parse in 204.73s
[Loki::Scheduler] Enriched call tree in 20.53s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake

And with 12 build processes:

(loki_env) [naml@ac6-102 ifs-source]$ loki-transform.py plan --mode idem --config arpifs/loki_physics.config -s arpifs/phys_ec/ -s surf/external/ -s surf/module --plan-file=../../my_plan.cmake --num-workers=12
[Loki] Creating CMake plan file from config: arpifs/loki_physics.config
[Loki::Scheduler] Scheduler:: Initial file parse in 2.11s
...
[Loki::Scheduler] Performed initial source scan in 9.57s
[Loki::Scheduler] Performed full source parse in 36.45s
[Loki::Scheduler] Enriched call tree in 20.57s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake

github-actions · 2024-11-04T14:23:20Z

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/426/index.html

Also adds small draft test for expression pickling.

mlange05 · 2026-02-25T12:13:21Z

Ok, this has now been rebased over latest main and some of the performance worries observed before have now disappeared. An updated breakdown of the timings can be found below. I'm aware this is only the first step towards some further consolidation of parallel / queuing utilities in the loki package, but it's a functional start that already can bring some real-world benefit. I would appreciate some feedback on general design, so paging Dr. @reuterbal for full review.

# Sequential executor (everything on main process)
(loki_env) [naml@aa6-198 ifs-source]$ loki-transform.py plan --mode idem --config cmake/loki.config -s arpifs/ -s surf/ --plan-file=../../my_plan.cmake --num-workers=0
[Loki] Creating CMake plan file from config: cmake/loki.config
...
[Loki::Scheduler] Scheduler:: Initial file parse in 46.88s
[Loki::Scheduler] Performed initial source scan in 58.09s
[Loki-transform] Applying custom pipeline idem from config:
...
[Loki::Scheduler] Applied transformation <IdemTransformation> in 0.17s
[Loki::Scheduler] Applied transformation <ModuleWrapTransformation> in 0.19s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 33.60s
[Loki::Scheduler] Performed initial source scan in 39.81s
[Loki::Scheduler] Applied transformation <DependencyTransformation> in 0.22s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 34.12s
[Loki::Scheduler] Performed initial source scan in 39.50s
[Loki::Scheduler] Applied transformation <FileWriteTransformation> in 0.05s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake
[Loki::Scheduler] Applied transformation <CMakePlanTransformation> in 0.07s


# Single-process executor (pickling from main to worker process)
(loki_env) [naml@aa6-198 ifs-source]$ loki-transform.py plan --mode idem --config cmake/loki.config -s arpifs/ -s surf/ --plan-file=../../my_plan.cmake --num-workers=1
[Loki] Creating CMake plan file from config: cmake/loki.config
...
[Loki::Scheduler] Scheduler:: Initial file parse in 52.03s
[Loki::Scheduler] Performed initial source scan in 67.39s
[Loki-transform] Applying custom pipeline idem from config:
...
[Loki::Scheduler] Applied transformation <IdemTransformation> in 0.17s
[Loki::Scheduler] Applied transformation <ModuleWrapTransformation> in 0.19s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 37.94s
[Loki::Scheduler] Performed initial source scan in 43.49s
[Loki::Scheduler] Applied transformation <DependencyTransformation> in 0.23s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 38.62s
[Loki::Scheduler] Performed initial source scan in 44.18s
[Loki::Scheduler] Applied transformation <FileWriteTransformation> in 0.05s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake
[Loki::Scheduler] Applied transformation <CMakePlanTransformation> in 0.07s


# Multi-process executor (3 worker processes)
(loki_env) [naml@aa6-198 ifs-source]$ loki-transform.py plan --mode idem --config cmake/loki.config -s arpifs/ -s surf/ --plan-file=../../my_plan.cmake --num-workers=3
[Loki] Creating CMake plan file from config: cmake/loki.config
...
[Loki::Scheduler] Scheduler:: Initial file parse in 22.23s
[Loki::Scheduler] Performed initial source scan in 37.73s
[Loki-transform] Applying custom pipeline idem from config:
...
[Loki::Scheduler] Applied transformation <IdemTransformation> in 0.17s
[Loki::Scheduler] Applied transformation <ModuleWrapTransformation> in 0.19s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 23.44s
[Loki::Scheduler] Performed initial source scan in 28.90s
[Loki::Scheduler] Applied transformation <DependencyTransformation> in 0.22s
...
[Loki::Scheduler] Scheduler:: Initial file parse in 22.28s
[Loki::Scheduler] Performed initial source scan in 27.81s
[Loki::Scheduler] Applied transformation <FileWriteTransformation> in 0.05s
[Loki] Scheduler writing CMake plan: ../../my_plan.cmake
[Loki::Scheduler] Applied transformation <CMakePlanTransformation> in 0.07s

mlange05 requested a review from reuterbal November 4, 2024 14:20

mlange05 force-pushed the naml-scheduler-parallel-for-real branch 3 times, most recently from 3109cf1 to 31aba1f Compare February 24, 2026 15:34

Batch: Separate initial source read from item creation during scan

c6ed635

mlange05 force-pushed the naml-scheduler-parallel-for-real branch from 4eddd7c to 607afdf Compare February 24, 2026 19:24

mlange05 added 4 commits February 25, 2026 08:46

Batch: Simple parallelisation of initial source scan

7bdda1e

Loki-transform: Add num_workers argument to plan and convert

29e40b6

Module: Only drop _ast it we have it

2c0a7a9

Scheduler: Use and keep one single ProcessPoolExecutor object

caab0fb

mlange05 force-pushed the naml-scheduler-parallel-for-real branch from 607afdf to 617a234 Compare February 25, 2026 08:46

mlange05 added 5 commits February 25, 2026 09:50

Expression: Fix constructor of Cast symbols and fix pickling

2cda301

Also adds small draft test for expression pickling.

Scheduler: Process initial full parse in parallel

b9709f6

Batch: Add a SerialExecutor for serial mode with compatible API

71aa93e

Batch: Log Scheduler-level enrichment at INFO level

8251219

Expression: Adjust use of Cast to appease linter gods

bf53164

mlange05 force-pushed the naml-scheduler-parallel-for-real branch from 617a234 to bf53164 Compare February 25, 2026 09:51

mlange05 marked this pull request as ready for review February 25, 2026 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch: Process-parallel directory scan and initial file read#426

Batch: Process-parallel directory scan and initial file read#426
mlange05 wants to merge 10 commits intomainfrom
naml-scheduler-parallel-for-real

mlange05 commented Nov 4, 2024

Uh oh!

github-actions bot commented Nov 4, 2024

Uh oh!

mlange05 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mlange05 commented Nov 4, 2024

Performance

Uh oh!

github-actions bot commented Nov 4, 2024

Uh oh!

mlange05 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant