Skip to content

more asynchronous pydra - different from how it's currently implemented #487

Open
@satra

Description

@satra

yesterday i ran a workflow that made me think about how pydra could handle it

  • (shell) use dcm2niix to convert groups of dicoms
  • (singularity) run kwyk using singularity + docker on a gpu (here i optimized to run multiple files through the same GPU process, but this could be parallelized if the configuration allowed it)
  • (python) use nibabel to do some computation on the 3 images that resulted from the first two steps.
  • (python) use pandas to do some aggregation

i realized as i was doing this that i was doing this in parts. the kwyk process was running in the background in a single process generating outputs. while i was getting summaries of the next two steps iteratively on however many subjects had run through. so the main process had not concluded, but i was effectively emitting info that i could use in downstream tasks, which were themselves caching and updating as new triggers came in.

i suspect as we deal with really large datasets, some sort of asynchronous execution with message passing would be nice to have.

perhaps this out of scope, but i wanted to at least consider the possibility and what that would mean architecturally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    Status

    v1.1-v1.2

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions