more asynchronous pydra - different from how it's currently implemented

yesterday i ran a workflow that made me think about how pydra could handle it

- (shell) use dcm2niix to convert groups of dicoms
- (singularity) run kwyk using singularity + docker on a gpu (here i optimized to run multiple files through the same GPU process, but this could be parallelized if the configuration allowed it)
- (python) use nibabel to do some computation on the 3 images that resulted from the first two steps.
- (python) use pandas to do some aggregation

i realized as i was doing this that i was doing this in parts. the kwyk process was running in the background in a single process generating outputs. while i was getting summaries of the next two steps iteratively on however many subjects  had run through.  so the main process had not concluded, but i was effectively emitting info that i could use in downstream tasks, which were themselves caching and updating as new triggers came in.

i suspect as we deal with really large datasets, some sort of asynchronous execution with message passing would be nice to have.

perhaps this out of scope, but i wanted to at least consider the possibility and what that would mean architecturally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

more asynchronous pydra - different from how it's currently implemented #487

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

more asynchronous pydra - different from how it's currently implemented #487

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions