Description
yesterday i ran a workflow that made me think about how pydra could handle it
- (shell) use dcm2niix to convert groups of dicoms
- (singularity) run kwyk using singularity + docker on a gpu (here i optimized to run multiple files through the same GPU process, but this could be parallelized if the configuration allowed it)
- (python) use nibabel to do some computation on the 3 images that resulted from the first two steps.
- (python) use pandas to do some aggregation
i realized as i was doing this that i was doing this in parts. the kwyk process was running in the background in a single process generating outputs. while i was getting summaries of the next two steps iteratively on however many subjects had run through. so the main process had not concluded, but i was effectively emitting info that i could use in downstream tasks, which were themselves caching and updating as new triggers came in.
i suspect as we deal with really large datasets, some sort of asynchronous execution with message passing would be nice to have.
perhaps this out of scope, but i wanted to at least consider the possibility and what that would mean architecturally.
Metadata
Metadata
Assignees
Type
Projects
Status