-
Notifications
You must be signed in to change notification settings - Fork 310
Description
Summary
This issue proposes decoupling fMRIPrep from sMRIPrep by adopting a precomputed-first strategy:
- fMRIPrep first checks if all sMRIPrep fit derivatives are already present (via
--derivatives). If so, it populates buffer nodes and skips all sMRIPrep computation. - If the fit derivatives are incomplete, fMRIPrep falls back to calling
init_anat_fit_wfexactly as today, passing any partial cache asprecomputed=.
This would remove ~200 lines of fragile inline orchestration from init_single_subject_wf, make the sMRIPrep output contract explicit, and enable the "run sMRIPrep once, fMRIPrep many times" workflow for multi-task/multi-session datasets.
Current coupling: three layers
Layer 1: Workflow embedding (tight)
fMRIPrep imports init_anat_fit_wf() and embeds it directly into its Nipype DAG (call site at L353-L371), passing 16 configuration kwargs. The returned pe.Workflow is a sub-workflow of fMRIPrep's subject graph — they share the same execution context.
Additionally, clean_datasinks() patches every DataSink in the entire workflow tree to remove out_path_base, forcing all sMRIPrep outputs into fMRIPrep's output directory. This is a maintenance hazard: any new DataSink added in sMRIPrep is silently patched. It is called from L617 and L941.
Layer 2: Transform-stage orchestration (~200 lines — the main cost)
Beyond init_anat_fit_wf, fMRIPrep imports and assembles ~10 sMRIPrep transform-stage workflows inline at level == 'full'. These are the most fragile lines in fMRIPrep — they break whenever sMRIPrep changes workflow signatures:
| Code block | Lines | What it does |
|---|---|---|
| Template iterator + standard-space volumes | 29 | init_template_iterator_wf + init_ds_anat_volumes_wf — resample T1w/mask/dseg/tpms into each standard space |
| Surface derivatives + outputs | 39 | init_surface_derivatives_wf + init_ds_surfaces_wf + init_ds_surface_metrics_wf + init_ds_fs_segs_wf — inflated surfaces, curvature, aparc, aseg |
| CIFTI morphometrics pipeline | 111 | init_gifti_morphometrics_wf + init_hcp_morphometrics_wf + init_morph_grayords_wf + init_resample_surfaces_wf + init_ds_grayord_metrics_wf + init_ds_surfaces_wf |
Layer 3: Utility imports (loose, acceptable)
Other sMRIPrep imports across fMRIPrep:
| File | Line | Import |
|---|---|---|
base.py:100 |
stringify_sessions |
|
base.py:271 |
collect_derivatives (as collect_anat_derivatives) |
|
base.py:505 |
TemplateFlowSelect |
|
bold/base.py:678 |
init_resample_surfaces_wf (for non-CIFTI surface outputs) |
|
bold/resampling.py:884,961 |
smriprep.data (atlas ROI files) |
|
interfaces/reports.py:43 |
ReconAll (for reports) |
|
interfaces/bids.py:184 |
stringify_sessions |
What fMRIPrep consumes from sMRIPrep
Every output field flowing from anat_fit_wf.outputnode into downstream workflows (connection block at L882-L900 for BOLD, plus L427-L445 for standard space, L474-L498 for surfaces, L556-L614 for CIFTI):
Tier 1 — Core volumetric (always needed):
t1w_preproc, t1w_mask, t1w_dseg, t1w_tpms, t1w_valid_list
Tier 2 — FreeSurfer / surfaces (when run_reconall):
subjects_dir, subject_id, fsnative2t1w_xfm, white, pial, midthickness, thickness, sulc, curv, anat_ribbon, cortex_mask, sphere_reg_fsLR/sphere_reg_msm
Tier 3 — Standard space and CIFTI (at level='full'):
template, anat2std_xfm, std2anat_xfm, std_t1w, std_mask, midthickness_fsLR, curv_fsLR, thickness_fsLR, sulc_fsLR
Critical observation: There is no formal contract specification. The coupling is encoded purely in Nipype connection tuples scattered across base.py. A field rename in sMRIPrep silently breaks fMRIPrep.
Proposed architecture
The two-phase design
Phase 1 — Precomputed-first lookup: fMRIPrep checks for existing sMRIPrep derivatives (via --derivatives or --anat-derivatives). Using smriprep.utils.bids.collect_derivatives(), it populates an anatomical_cache dict. If the cache contains all fields required by the current configuration (level, run_reconall, spaces, cifti_output, msm_sulc), fMRIPrep skips init_anat_fit_wf entirely and feeds the cache into buffer nodes.
Phase 2 — Fallback to sMRIPrep: If the cache is incomplete, fMRIPrep calls init_anat_fit_wf as today, with the partial cache as precomputed=. This preserves the single-command user experience — fmriprep /data /out participant continues to work without pre-running sMRIPrep.
Targeted edits in init_single_subject_wf
1. Cache collection + validation (enhance L269-L283)
Currently ~15 lines collecting the cache. Add a validation function (~40 lines) that checks whether the cache satisfies the current configuration:
anatomical_cache = collect_and_validate_anat_cache(
derivatives=config.execution.derivatives,
subject_id=subject_id,
session_id=session_id,
spaces=spaces,
freesurfer=freesurfer,
msm_sulc=msm_sulc,
cifti_output=config.workflow.cifti_output,
level=config.workflow.level,
)
anat_cache_complete = _is_anat_cache_complete(anatomical_cache, ...)Validate outputs, not settings: if the right files exist in the right spaces, they are usable regardless of the parameters that produced them.
2. Conditional anat_fit_wf (replace L353-L371)
if anat_cache_complete:
# Buffer-only: no sMRIPrep computation
anat_fit_wf = _init_anat_buffer_wf(anatomical_cache, name='anat_fit_wf')
else:
# Fallback: run sMRIPrep, passing partial cache
from smriprep.workflows.anatomical import init_anat_fit_wf
anat_fit_wf = init_anat_fit_wf(
...same 16 kwargs...,
precomputed=anatomical_cache,
)The _init_anat_buffer_wf function (~30 lines) returns a trivial workflow whose outputnode mirrors init_anat_fit_wf's outputnode, populated from the cache.
3. REMOVE inline transform-stage orchestration (L413-L614)
This is the big win. These ~200 lines assemble 10+ sMRIPrep workflows with intricate connections. Under decoupling, all of these become sMRIPrep's responsibility when run standalone. fMRIPrep would only retain:
- The
select_MNI2009c_xfmKeySelect — feeds into BOLD carpetplots (L920-L925) - The
select_MNI6_xfm/select_MNI6_tplnodes — feed into BOLD CIFTI resampling (L932-L934)
These are not anatomical outputs — they are transform selectors that thread spatial mappings into the functional pipeline. They should stay.
4. REMOVE clean_datasinks() (L1022-L1027)
With sMRIPrep writing to its own derivatives directory, there is no need to patch out_path_base. fMRIPrep would instead add sMRIPrep's output as a DatasetLink for BIDS-URI provenance. Remove both call sites at L617 and L941.
5. BOLD connections stay unchanged (L882-L939)
The connections from anat_fit_wf.outputnode to bold_wf.inputnode remain identical. The buffer workflow presents the same outputnode interface as the full init_anat_fit_wf.
6. Remaining sMRIPrep imports to address
| File | Import | Action |
|---|---|---|
bold/base.py:678 |
init_resample_surfaces_wf (for non-CIFTI surface outputs) |
Could be buffered if sMRIPrep emits midthickness_{template}_{density} |
bold/resampling.py:884,961 |
smriprep.data (atlas ROI files for CIFTI) |
Atlas ROIs should be moved to a shared location or looked up from the sMRIPrep derivatives |
interfaces/reports.py:43 |
ReconAll (for reports) |
This is a lightweight interface import; acceptable to keep |
interfaces/bids.py:184 |
stringify_sessions |
Utility; acceptable to keep |
Code simplification assessment
| Section | Current lines | After decoupling | Change |
|---|---|---|---|
| Cache collection + validation | ~15 | ~60 | +45 (new validation + buffer function) |
anat_fit_wf call |
~19 | ~25 | +6 (conditional + fallback) |
| Transform-stage orchestration (L413-L614) | ~200 | ~0 | -200 |
clean_datasinks() + calls |
~8 | ~0 | -8 |
| sMRIPrep import block (L173-L185) | ~13 | ~3 | -10 (keep init_anat_fit_wf, collect_derivatives, stringify_sessions) |
| Net | ~255 | ~88 | -167 lines, -27% of init_single_subject_wf |
The 200 most complex lines are eliminated — the ones that assemble 10+ sMRIPrep workflows with intricate connection tuples that break whenever sMRIPrep's function signatures change.
Prerequisites (changes needed in sMRIPrep)
-
sMRIPrep standalone must emit ALL outputs fMRIPrep needs — including surface derivatives, fsLR-resampled surfaces, CIFTI morphometrics, and cortex masks. Currently, fMRIPrep builds these transform-stage workflows itself because
init_anat_preproc_wfdoes not cover everything. -
Add
--levelflag to sMRIPrep CLI —--level minimal(fit only, fast, deterministic, reusable) vs--level full(fit + all transforms). This mirrors fMRIPrep's existing level gating. -
Expand
collect_derivatives()to cover all fields — includingmidthickness_fsLR,sphere_reg_fsLR/sphere_reg_msm,cortex_mask,anat_ribbon, CIFTI morphometric dscalars. -
Standardize FreeSurfer output path — fMRIPrep's BBR registration (
init_bbreg_wf) needs a liveSUBJECTS_DIRwith the full reconstruction (not just output surfaces). sMRIPrep should place this at a standard path (sourcedata/freesurfer/). -
Version stamp in
dataset_description.json—GeneratedBywith sMRIPrep version and container info for compatibility checking.
Risks and mitigations
| Risk | Mitigation |
|---|---|
| Version mismatch between sMRIPrep producer and fMRIPrep consumer | Validate outputs, not settings. If the required files exist, they are usable. Check GeneratedBy in dataset_description.json for compatibility warnings. |
Configuration divergence (e.g., sMRIPrep run without MNI152NLin2009cAsym but fMRIPrep needs it for carpetplots) |
_is_anat_cache_complete() checks that needed normalizations/surfaces exist. If missing, fail with a clear error or fall back to running sMRIPrep. |
| FreeSurfer directory completeness | collect_derivatives() validates the FreeSurfer dir exists and contains key files (mri/brain.mgz, surf/lh.white, etc.) |
| Single-command UX regression | The fallback path ensures fmriprep /data /out participant continues to work without pre-running sMRIPrep. Decoupling is an optimization, not a requirement. |
| Transform-stage boundary is fuzzy (surface derivatives involve computation, not just file writes) | sMRIPrep owns these computations when run standalone. The contract is "provide these BIDS-Derivatives files," not "run these specific workflows." |
Recommended phasing
| Phase | Scope | Work |
|---|---|---|
| 1 | sMRIPrep only | Add --level, emit all outputs fMRIPrep needs standalone, expand collect_derivatives(), standardize FreeSurfer path |
| 2 | fMRIPrep only | Add _is_anat_cache_complete() + _init_anat_buffer_wf(), make anat_fit_wf conditional, remove inline transform orchestration (L413-L614), remove clean_datasinks() (L1022-L1027), add DatasetLink for sMRIPrep |
| 3 | Integration | Integration tests with synthetic sMRIPrep derivative fixtures, contract documentation, user-facing documentation for the two-step workflow |
Phase 1 is backwards-compatible: fMRIPrep continues to work unchanged until Phase 2 is implemented.