Skip to content

[ENH] CompCor enhancement #2878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Apr 29, 2019
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
329c74d
add variance-driven component selection, return component metadata
rciric Jan 19, 2019
17f3e12
expose metadata to interface, fix component selection for multiple masks
rciric Jan 19, 2019
114e6d4
propagate failure mode if provided
rciric Jan 19, 2019
6f4fc19
allow mask naming in metadata
rciric Jan 19, 2019
4d2208e
add contributor
rciric Jan 19, 2019
bfbde82
include component index in metadata
rciric Jan 19, 2019
0373879
update autotests and make naming consistent
rciric Jan 19, 2019
2c551d0
(CompCor) more intuitive interface following review from @effigies
rciric Jan 20, 2019
a53cd46
manually set `num_components` in test
rciric Jan 20, 2019
b811d47
manually set `num_components` in test
rciric Jan 20, 2019
2743189
Merge branch 'master' of https://github.com/rciric/nipype
rciric Jan 21, 2019
577e395
add unit test for variance_threshold condition
rciric Jan 21, 2019
66c7540
provide mask name to circumvent test failure
rciric Jan 21, 2019
0bb0096
(CompCor) try using an OrderedDict for metadata
rciric Jan 21, 2019
94bea4a
first-pass refactor CompCor to SimpleInterface
rciric Feb 4, 2019
addb0e9
return metadata for all components regardless of retention criterion
rciric Feb 6, 2019
b04c9ca
@oesteban: limit np array use, clean up conditionals, remove invalid obj
rciric Feb 8, 2019
e957e87
less np array use; unique names for dropped components
rciric Feb 9, 2019
797801e
ensure absolute path to components file
rciric Feb 9, 2019
67a3276
(CompCor) try BaseInterface
rciric Feb 9, 2019
fe430f5
ensure absolute path to components file
rciric Feb 9, 2019
1625bdb
update per @oesteban 's review
rciric Feb 15, 2019
9afb3f5
assign output to _results
rciric Feb 15, 2019
689d064
assign output to _results
rciric Feb 15, 2019
f390bc6
some fixes
oesteban Feb 16, 2019
ad3d440
testing pickling of variance_threshold
oesteban Feb 16, 2019
fd41b74
``traits.Range`` cannot be pickled with traits>=5 and python 2.7
oesteban Feb 16, 2019
01a78ec
Merge pull request #1 from oesteban/rciric-patch-1
rciric Feb 16, 2019
a742c9c
pacify codacy
rciric Feb 16, 2019
518a489
revert unnecessary squeeze, correct docs
rciric Feb 21, 2019
deceb95
revise in accordance with @effigies review
rciric Feb 22, 2019
fa64907
revise in accordance with @effigies review
rciric Feb 23, 2019
e6dfe7d
ensure s is defined, support NaN failure mode with empty mask
rciric Mar 1, 2019
27ed03f
filter handles empty masks, use `squeeze_image`
rciric Mar 27, 2019
422c04c
Merge branch 'master' of https://github.com/nipy/nipype
rciric Mar 27, 2019
144fca3
Merge branch 'master' of https://github.com/nipy/nipype
rciric Mar 28, 2019
82a25c2
default to old behaviour for temporal filters
rciric Mar 28, 2019
4c1af8a
Merge branch 'master' into master
effigies Apr 11, 2019
79e840d
integrate @effigies review comments
rciric Apr 19, 2019
1b1b6fa
propagate retention status to metadata; use list instead of generator…
rciric Apr 19, 2019
89ba3b4
Merge branch 'master' of https://github.com/rciric/nipype
rciric Apr 19, 2019
b80a3d7
update unit test to include new metadata field
rciric Apr 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,11 @@
"name": "Liem, Franz",
"orcid": "0000-0003-0646-4810"
},
{
"affiliation": "Stanford University",
"name": "Ciric, Rastko",
"orcid": "0000-0001-6347-7939"
},
{
"affiliation": "UniversityHospital Heidelberg, Germany",
"name": "Kleesiek, Jens"
Expand Down
220 changes: 169 additions & 51 deletions nipype/algorithms/confounds.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
absolute_import)
from builtins import range

# Py2 compat: http://python-future.org/compatible_idioms.html#collections-counter-and-ordereddict
from future import standard_library
standard_library.install_aliases()
from collections import OrderedDict

import os
import os.path as op

Expand All @@ -19,7 +24,8 @@
from ..external.due import BibTeX
from ..interfaces.base import (traits, TraitedSpec, BaseInterface,
BaseInterfaceInputSpec, File, isdefined,
InputMultiPath, OutputMultiPath)
InputMultiPath, OutputMultiPath,
SimpleInterface)
from ..utils import NUMPY_MMAP
from ..utils.misc import normalize_mc_params

Expand Down Expand Up @@ -386,11 +392,30 @@ class CompCorInputSpec(BaseInterfaceInputSpec):
requires=['mask_files'],
desc=('Position of mask in `mask_files` to use - '
'first is the default.'))
mask_names = traits.List(traits.Str,
desc='Names for provided masks (for printing into metadata). '
'If provided, it must be as long as the final mask list '
'(after any merge and indexing operations).')
components_file = traits.Str(
'components_file.txt',
usedefault=True,
desc='Filename to store physiological components')
num_components = traits.Int(6, usedefault=True) # 6 for BOLD, 4 for ASL
num_components = traits.Either('all', traits.Range(low=1),
xor=['variance_threshold'],
desc='Number of components to return from the decomposition. If '
'`num_components` is `all`, then all components will be '
'retained.')
# 6 for BOLD, 4 for ASL
# automatically instantiated to 6 in CompCor below if neither
# `num_components` nor `variance_threshold` is defined (for
# backward compatibility)
variance_threshold = traits.Float(xor=['num_components'],
desc='Select the number of components to be returned automatically '
'based on their ability to explain variance in the dataset. '
'`variance_threshold` is a fractional value between 0 and 1; '
'the number of components retained will be equal to the minimum '
'number of components necessary to explain the provided '
'fraction of variance in the masked time series.')
pre_filter = traits.Enum(
'polynomial',
'cosine',
Expand Down Expand Up @@ -418,6 +443,8 @@ class CompCorInputSpec(BaseInterfaceInputSpec):
'unspecified')
save_pre_filter = traits.Either(
traits.Bool, File, desc='Save pre-filter basis as text file')
save_metadata = traits.Either(
traits.Bool, File, desc='Save component metadata as text file')
ignore_initial_volumes = traits.Range(
low=0,
usedefault=True,
Expand All @@ -433,9 +460,10 @@ class CompCorOutputSpec(TraitedSpec):
components_file = File(
exists=True, desc='text file containing the noise components')
pre_filter_file = File(desc='text file containing high-pass filter basis')
metadata_file = File(desc='text file containing component metadata')


class CompCor(BaseInterface):
class CompCor(SimpleInterface):
"""
Interface with core CompCor computation, used in aCompCor and tCompCor

Expand Down Expand Up @@ -548,9 +576,22 @@ def _run_interface(self, runtime):
'{} cannot detect repetition time from image - '
'Set the repetition_time input'.format(self._header))

components, filter_basis = compute_noise_components(
imgseries.get_data(), mask_images, self.inputs.num_components,
self.inputs.pre_filter, degree, self.inputs.high_pass_cutoff, TR)
if isdefined(self.inputs.variance_threshold):
components_criterion = self.inputs.variance_threshold
elif isdefined(self.inputs.num_components):
components_criterion = self.inputs.num_components
else:
components_criterion = 6
IFLOGGER.warning('`num_components` and `variance_threshold` are '
'not defined. Setting number of components to 6 '
'for backward compatibility. Please set either '
'`num_components` or `variance_threshold`, as '
'this feature may be deprecated in the future.')

components, filter_basis, metadata = compute_noise_components(
imgseries.get_data(), mask_images, components_criterion,
self.inputs.pre_filter, degree, self.inputs.high_pass_cutoff, TR,
self.inputs.failure_mode, self.inputs.mask_names)

if skip_vols:
old_comp = components
Expand All @@ -561,16 +602,25 @@ def _run_interface(self, runtime):

components_file = os.path.join(os.getcwd(),
self.inputs.components_file)
components_header = self._make_headers(components.shape[1])
np.savetxt(
components_file,
components,
fmt=b"%.10f",
delimiter='\t',
header=self._make_headers(components.shape[1]),
header='\t'.join(components_header),
comments='')
self._results['components_file'] = os.path.abspath(
self.inputs.components_file)

if self.inputs.pre_filter and self.inputs.save_pre_filter:
pre_filter_file = self._list_outputs()['pre_filter_file']
save_pre_filter = self.inputs.save_pre_filter
if save_pre_filter:
if isinstance(save_pre_filter, bool):
pre_filter_file = os.path.abspath('pre_filter.tsv')
else:
pre_filter_file = save_pre_filter
self._results['pre_filter_file'] = pre_filter_file
if self.inputs.pre_filter and save_pre_filter:
ftype = {
'polynomial': 'Legendre',
'cosine': 'Cosine'
Expand All @@ -597,29 +647,37 @@ def _run_interface(self, runtime):
header='\t'.join(header),
comments='')

save_metadata = self.inputs.save_metadata
if save_metadata:
if isinstance(save_metadata, bool):
metadata_file = os.path.abspath('component_metadata.tsv')
else:
metadata_file = save_metadata
components_names = np.empty(len(metadata['mask']),
dtype='object_')
retained = np.where(metadata['retained'])
not_retained = np.where(np.logical_not(metadata['retained']))
components_names[retained] = components_header
components_names[not_retained] = ([
'dropped{}'.format(i) for i in range(len(not_retained[0]))])
self._results['metadata_file'] = metadata_file
with open(metadata_file, 'w') as f:
f.write('{}\t{}\t{}\t{}\t{}\n'.format('component',
*list(metadata.keys())))
for i in zip(components_names, *metadata.values()):
f.write('{0[0]}\t{0[1]}\t{0[2]:.10f}\t'
'{0[3]:.10f}\t{0[4]:.10f}\n'.format(i))

return runtime

def _process_masks(self, mask_images, timeseries=None):
return mask_images

def _list_outputs(self):
outputs = self._outputs().get()
outputs['components_file'] = os.path.abspath(
self.inputs.components_file)

save_pre_filter = self.inputs.save_pre_filter
if save_pre_filter:
if isinstance(save_pre_filter, bool):
save_pre_filter = os.path.abspath('pre_filter.tsv')
outputs['pre_filter_file'] = save_pre_filter

return outputs

def _make_headers(self, num_col):
header = self.inputs.header_prefix if \
isdefined(self.inputs.header_prefix) else self._header
headers = ['{}{:02d}'.format(header, i) for i in range(num_col)]
return '\t'.join(headers)
return headers


class ACompCor(CompCor):
Expand Down Expand Up @@ -1139,15 +1197,30 @@ def combine_mask_files(mask_files, mask_method=None, mask_index=None):
return [img]


def compute_noise_components(imgseries, mask_images, num_components,
filter_type, degree, period_cut, repetition_time):
def compute_noise_components(imgseries, mask_images, components_criterion=0.5,
filter_type=False, degree=0, period_cut=128,
repetition_time=None, failure_mode='error',
mask_names=''):
"""Compute the noise components from the imgseries for each mask

imgseries: a nibabel img
mask_images: a list of nibabel images
num_components: number of noise components to return
filter_type: type off filter to apply to time series before computing
noise components.
Parameters
----------
imgseries: nibabel NIfTI object
Time series data to be decomposed.
mask_images: list
List of nibabel images. Time series data from `img_series` is subset
according to the spatial extent of each mask, and the subset data is
then decomposed using principal component analysis. Masks should be
coextensive with either anatomical or spatial noise ROIs.
components_criterion: float
Number of noise components to return. If this is a decimal value
between 0 and 1, then `create_noise_components` will instead return
the smallest number of components necessary to explain the indicated
fraction of variance. If `components_criterion` is `all`, then all
components will be returned.
filter_type: str
Type of filter to apply to time series before computing
noise components.
'polynomial' - Legendre polynomial basis
'cosine' - Discrete cosine (DCT) basis
False - None (mean-removal only)
Expand All @@ -1158,16 +1231,24 @@ def compute_noise_components(imgseries, mask_images, num_components,
period_cut: minimum period (in sec) for DCT high-pass filter
repetition_time: time (in sec) between volume acquisitions

returns:

components: a numpy array
basis: a numpy array containing the (non-constant) filter regressors

Returns
-------
components: numpy array
Numpy array containing the requested set of noise components
basis: numpy array
Numpy array containing the (non-constant) filter regressors
metadata: OrderedDict{str: numpy array}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an OrderedDict as opposed to a dict?

Dictionary of eigenvalues, fractional explained variances, and
cumulative explained variances.
"""
components = None
basis = np.array([])
for img in mask_images:
mask = img.get_data().astype(np.bool)
if components_criterion == 'all':
components_criterion = -1
if not mask_names:
mask_names = range(len(mask_images))
for i, img in zip(mask_names, mask_images):
mask = img.get_data().astype(np.bool).squeeze()
if imgseries.shape[:3] != mask.shape:
raise ValueError(
'Inputs for CompCor, timeseries and mask, do not have '
Expand All @@ -1186,35 +1267,72 @@ def compute_noise_components(imgseries, mask_images, num_components,
voxel_timecourses, repetition_time, period_cut)
elif filter_type in ('polynomial', False):
# from paper:
# "The constant and linear trends of the columns in the matrix M were
# removed [prior to ...]"
# "The constant and linear trends of the columns in the matrix M
# were removed [prior to ...]"
voxel_timecourses, basis = regress_poly(degree, voxel_timecourses)

# "Voxel time series from the noise ROI (either anatomical or tSTD) were
# placed in a matrix M of size Nxm, with time along the row dimension
# and voxels along the column dimension."
# "Voxel time series from the noise ROI (either anatomical or tSTD)
# were placed in a matrix M of size Nxm, with time along the row
# dimension and voxels along the column dimension."
M = voxel_timecourses.T

# "[... were removed] prior to column-wise variance normalization."
M = M / _compute_tSTD(M, 1.)

# "The covariance matrix C = MMT was constructed and decomposed into its
# principal components using a singular value decomposition."
# "The covariance matrix C = MMT was constructed and decomposed into
# its principal components using a singular value decomposition."
try:
u, _, _ = fallback_svd(M, full_matrices=False)
u, s, _ = fallback_svd(M, full_matrices=False)
except np.linalg.LinAlgError:
if self.inputs.failure_mode == 'error':
if failure_mode == 'error':
raise
u = np.ones((M.shape[0], num_components), dtype=np.float32) * np.nan
if components_criterion >= 1:
u = np.empty((M.shape[0], components_criterion),
dtype=np.float32) * np.nan
else:
continue

variance_explained = (s ** 2) / np.sum(s ** 2)
cumulative_variance_explained = np.cumsum(variance_explained)

num_components = int(components_criterion)
if 0 < components_criterion < 1:
num_components = np.searchsorted(cumulative_variance_explained,
components_criterion) + 1
elif components_criterion == -1:
num_components = len(s)

num_components = int(num_components)
if num_components == 0:
break
if components is None:
components = u[:, :num_components]
metadata = OrderedDict()
metadata['mask'] = [i] * len(s)
metadata['singular_value'] = s
metadata['variance_explained'] = variance_explained
metadata['cumulative_variance_explained'] = (
cumulative_variance_explained)
metadata['retained'] = [i < num_components for i in range(len(s))]
else:
components = np.hstack((components, u[:, :num_components]))
if components is None and num_components > 0:
if self.inputs.failure_mode == 'error':
metadata['mask'] = metadata['mask'] + [i] * len(s)
metadata['singular_value'] = (
np.hstack((metadata['singular_value'], s)))
metadata['variance_explained'] = (
np.hstack((metadata['variance_explained'],
variance_explained)))
metadata['cumulative_variance_explained'] = (
np.hstack((metadata['cumulative_variance_explained'],
cumulative_variance_explained)))
metadata['retained'] = (metadata['retained']
+ [i < num_components
for i in range(len(s))])
if components is None:
if failure_mode == 'error':
raise ValueError('No components found')
components = np.ones((M.shape[0], num_components), dtype=np.float32) * np.nan
return components, basis
components = np.full((M.shape[0], num_components), np.NaN)
return components, basis, metadata


def _compute_tSTD(M, x, axis=0):
Expand Down
Loading