clean code, fix show_figs showing figures when set to false in caps2surf,

donishadsmith · donishadsmith · commit f7626517e3a8 · 2024-06-26T17:55:37.000-04:00
add variance ratio and davies_bouldin, add nan in dataframe for caps
that dont exist in groups instead of zero to show better distinction,and
raise error in timeseries if no tr specified but condition is requested.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -41,6 +41,26 @@ noted in the changelog (i.e new functions or parameters, changes in parameter de
 - *.patch* : Contains no new features, simply fixes any identified bugs.
 - *.postN* : Consists of only metadata-related changes, such as updates to type hints or doc strings/documentation.
 
+## [0.12.0] - 2024-06-26
+- Entails some code cleaning and verification to ensure that the code cleaned for clarity purposes produces the same
+results.
+
+### 🚀 New/Added
+- Davies Bouldin and Variance Ratio (Calinski Harabasz) added
+
+### ♻ Changed
+- For `CAPs.calculate_metrics()` if performing an analysis on groups where each group has a different number of CAPs, then for "temporal_fraction",
+"persistence", and "counts", "nan" values will be seen for CAP numbers that exceed the group's number of CAPs.
+    - For instance, if group "A" has 2 CAPs but group "B" has 4 CAPs, the DataFrame will contain columns for CAP-1,
+      CAP-2, CAP-3, and CAP-4. However, for all members in group "A", CAP-3 and CAP-4 will contain "nan" values to
+      indicate that these CAPs are not applicable to the group. This differentiation helps distinguish between CAPs
+      that are not applicable to the group and CAPs that are applicable but had zero instances for a specific member.
+
+### 🐛 Fixes
+- Adds error earlier when tr is not specified or able to be retrieved form the bold metadata when the condition is specified
+instead of allowing the pipeline to produce this error later.
+- Fixed issue with `show_figs` in `CAP.caps2surf()` showing figure when set to False.
+
 ## [0.11.3] - 2024-06-24
 ### ♻ Changed
 - With parallel processing, joblib outputs are now returned as a generator as opposed to the default, which is a list,
diff --git a/README.md b/README.md
@@ -1,5 +1,6 @@
 # neurocaps
-[![DOI](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.11642615-blue)](https://doi.org/10.5281/zenodo.12523896)
+[![Latest Version](https://img.shields.io/pypi/v/neurocaps.svg)](https://pypi.python.org/pypi/neurocaps/)
+[![DOI](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.11642615-blue)](https://doi.org/10.5281/zenodo.12555589)
 [![Test Status](https://github.com/donishadsmith/neurocaps/actions/workflows/testing.yaml/badge.svg)](https://github.com/donishadsmith/neurocaps/actions/workflows/testing.yaml)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 
@@ -100,9 +101,9 @@ The provided example demonstrates setting up a custom parcellation containing no
 - **Parallel Processing:** Use parallel processing by specifying the number of CPU cores in the `n_cores` parameter in the `get_bold()` method. Testing on an HPC using a loop with `TimeseriesExtractor.get_bold()` to extract session 1 and 2 BOLD timeseries from 105 subjects from resting-state data (single run containing 360 volumes) and two task datasets (three runs containing 200 volumes each and two runs containing 200 volumes) reduced processing time from 5 hours 48 minutes to 1 hour 26 minutes (using 10 cores). *Note:* If you are using an HPC, remember to allocate the appropriate amount of CPU cores with your workload manager. For instance in slurm use `#SBATCH --cpus-per-task=10` if you intend to use 10 cores.
 
 **Main features for `CAP` includes:**
-- **Optimal Cluster Size Identification:** Perform the silhouette or elbow method to identify the optimal cluster size, saving the optimal model as an attribute.
-- **Parallel Processing:** Use parallel processing, when using the silhouette or elbow method, by specifying the number of CPU cores in the `n_cores` parameter in the `get_caps()` method. *Note:* If you are using an HPC, remember to allocate the appropriate amount of CPU cores with your workload manager. For instance in slurm use `#SBATCH --cpus-per-task=10` if you intend to use 10 cores.
-- **Grouping:** Perform CAPs analysis for entire sample or groups of subject IDs (using the `groups` parameter when initializing the `CAP` class). K-means clustering, silhouette and elbow methods, and plotting are done for each group when specified.
+- **Optimal Cluster Size Identification:** Perform the Davies Bouldin, Silhouette, Elbow, or Variance Ratio criterions to identify the optimal cluster size, saving the optimal model as an attribute.
+- **Parallel Processing:** Use parallel processing, when using the Davies Bouldin, Silhouette, Elbow, or Variance Ratio criterions , by specifying the number of CPU cores in the `n_cores` parameter in the `get_caps()` method. *Note:* If you are using an HPC, remember to allocate the appropriate amount of CPU cores with your workload manager. For instance in slurm use `#SBATCH --cpus-per-task=10` if you intend to use 10 cores.
+- **Grouping:** Perform CAPs analysis for entire sample or groups of subject IDs (using the `groups` parameter when initializing the `CAP` class). K-means clustering, all cluster selection methods (Davies Bouldin, Silhouette, Elbow, or Variance Ratio criterions), and plotting are done for each group when specified.
 - **CAP Visualization:** Visualize the CAPs as outer products or heatmaps, with options to use subplots to reduce the number of individual plots, as well as save. Refer to the [documentation](https://neurocaps.readthedocs.io/en/latest/generated/neurocaps.analysis.CAP.html#neurocaps.analysis.CAP.caps2plot) for the `caps2plot()` method in the `CAP` class for available `**kwargs` arguments and parameters to modify plots.
 - **Save CAPs as NifTIs:** Convert the atlas used for parcellation to a stat map and saves them (`caps2niftis`). 
 - **Surface Plot Visualization:** Convert the atlas used for parcellation to a stat map projected onto a surface plot with options to customize and save plots. Refer to the [documentation](https://neurocaps.readthedocs.io/en/latest/generated/neurocaps.analysis.CAP.html#neurocaps.analysis.CAP.caps2surf) for the `caps2surf()` method in the `CAP` class for available `**kwargs` arguments and parameters to modify plots. Also includes the option to save the NifTIs. There is also another a parameter in `caps2surf`, `fslr_giftis_dict`, which can be used if the CAPs NifTI files were converted to GifTI files using a tool such as Connectome Workbench, which may work better for converting your atlas to fslr space. This parameter allows plotting without re-running the analysis and only initializing the `CAP` class and using the `caps2surf` method is needed.
diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -1,7 +1,11 @@
 **neurocaps**
 =============
+.. image:: https://img.shields.io/pypi/v/neurocaps.svg
+   :target: https://pypi.python.org/pypi/neurocaps/
+   :alt: Latest Version
+
 .. image:: https://img.shields.io/badge/DOI-10.5281%2Fzenodo.11642615-blue
-   :target: https://doi.org/10.5281/zenodo.12523896
+   :target: https://doi.org/10.5281/zenodo.12555589
    :alt: DOI
 
 .. image:: https://github.com/donishadsmith/neurocaps/actions/workflows/testing.yaml/badge.svg
@@ -19,7 +23,7 @@ Citing
 ======
 ::
   
-  Smith, D. (2024). neurocaps. Zenodo. https://doi.org/10.5281/zenodo.12523896
+  Smith, D. (2024). neurocaps. Zenodo. https://doi.org/10.5281/zenodo.12555589
 
 Usage
 =====
@@ -91,10 +95,10 @@ Main features for ``TimeseriesExtractor`` includes:
 Main features for ``CAP`` includes:
 -----------------------------------
 
-- **Optimal Cluster Size Identification:** Perform the silhouette or elbow method to identify the optimal cluster size, saving the optimal model as an attribute.
-- **Parallel Processing:** Use parallel processing, when using the silhouette or elbow method, by specifying the number of CPU cores in the ``n_cores`` parameter in the ```get_caps()`` method. 
+- **Optimal Cluster Size Identification:** Perform the Davies Bouldin, Silhouette, Elbow, or Variance Ratio criterions to identify the optimal cluster size, saving the optimal model as an attribute.
+- **Parallel Processing:** Use parallel processing, when using the Davies Bouldin, Silhouette, Elbow, or Variance Ratio criterions by specifying the number of CPU cores in the ``n_cores`` parameter in the ```get_caps()`` method. 
   *Note:* If you are using an HPC, remember to allocate the appropriate amount of CPU cores with your workload manager. For instance in slurm use ``#SBATCH --cpus-per-task=10`` if you intend to use 10 cores.
-- **Grouping:** Perform CAPs analysis for entire sample or groups of subject IDs (using the ``groups`` parameter when initializing the ``CAP`` class). K-means clustering, silhouette and elbow methods, and plotting are done for each group when specified.
+- **Grouping:** Perform CAPs analysis for entire sample or groups of subject IDs (using the ``groups`` parameter when initializing the ``CAP`` class). K-means clustering, all cluster selection methods (Davies Bouldin, Silhouette, Elbow, or Variance Ratio criterions), and plotting are done for each group when specified.
 - **CAP Visualization:** Visualize the CAPs as outer products or heatmaps, with options to use subplots to reduce the number of individual plots, as well as save. 
   Refer to the `documentation <https://neurocaps.readthedocs.io/en/latest/generated/neurocaps.analysis.CAP.html#neurocaps.analysis.CAP.caps2plot>`_ for the ``caps2plot()`` method in the ``CAP`` class for available ``**kwargs`` arguments and parameters to modify plots.
 - **Save CAPs as NifTIs:** Convert the atlas used for parcellation to a stat map and saves them (``caps2niftis``). 
diff --git a/neurocaps/__init__.py b/neurocaps/__init__.py
@@ -2,4 +2,4 @@
 
 __all__=["analysis", "extraction"]
 # Version in a single place
-__version__ = "0.11.3"
+__version__ = "0.12.0"
diff --git a/neurocaps/_utils/__init__.py b/neurocaps/_utils/__init__.py
@@ -2,6 +2,7 @@
 from ._check_parcel_approach import _check_parcel_approach
 from ._pickle_to_dict import _convert_pickle_to_dict
 from ._cap_internals import _cap2statmap
+from ._cap_internals import _create_node_labels
 from ._cap_internals import _CAPGetter
 from ._cap_internals import _run_kmeans
 from ._timeseriesextractor_internals import _check_confound_names
diff --git a/neurocaps/_utils/_cap_internals/__init__.py b/neurocaps/_utils/_cap_internals/__init__.py
@@ -1,3 +1,4 @@
 from ._capgetter import _CAPGetter
 from ._cap2statmap import _cap2statmap
+from ._create_labels import _create_node_labels
 from ._run_kmeans import _run_kmeans
diff --git a/neurocaps/_utils/_cap_internals/_capgetter.py b/neurocaps/_utils/_cap_internals/_capgetter.py
@@ -8,14 +8,6 @@ def __init__(self):
         pass
 
     ### Attributes exist when CAP initialized
-    @property
-    def n_clusters(self):
-        return self._n_clusters
-
-    @property
-    def cluster_selection_method(self):
-        return self._cluster_selection_method
-
     @property
     def groups(self):
         return self._groups
@@ -31,6 +23,14 @@ def parcel_approach(self, parcel_dict):
         self._parcel_approach = _check_parcel_approach(parcel_approach=parcel_dict, call="setter")
 
     ### Attributes exist when CAP.get_caps() used
+    @property
+    def n_clusters(self):
+        return self._n_clusters if hasattr(self, "_n_clusters") else None
+
+    @property
+    def cluster_selection_method(self):
+        return self._cluster_selection_method if hasattr(self, "_cluster_selection_method") else None
+
     @property
     def n_cores(self):
         return self._n_cores if hasattr(self, "_n_cores") else None
@@ -47,6 +47,10 @@ def caps(self):
     def kmeans(self):
         return self._kmeans if hasattr(self, "_kmeans") else None
 
+    @property
+    def davies_bouldin(self):
+        return self._davies_bouldin if hasattr(self, "_davies_bouldin") else None
+
     @property
     def silhouette_scores(self):
         return self._silhouette_scores if hasattr(self, "_silhouette_scores") else None
@@ -55,6 +59,10 @@ def silhouette_scores(self):
     def inertia(self):
         return self._inertia if hasattr(self, "_inertia") else None
 
+    @property
+    def variance_ratio(self):
+        return self._variance_ratio if hasattr(self, "_variance_ratio") else None
+
     @property
     def optimal_n_clusters(self):
         return self._optimal_n_clusters if hasattr(self, "_optimal_n_clusters")else None
diff --git a/neurocaps/_utils/_cap_internals/_create_labels.py b/neurocaps/_utils/_cap_internals/_create_labels.py
@@ -0,0 +1,40 @@
+"""Internal function to create labels at the node level for caps2plot"""
+import collections, re
+
+def _create_node_labels(parcellation_name, parcel_approach, columns):
+    # Get frequency of each major hemisphere and region in Schaefer, AAL, or Custom atlas
+    if parcellation_name == "Schaefer":
+        nodes = parcel_approach[parcellation_name]["nodes"]
+        # Retain only the hemisphere and primary Schaefer network
+        nodes = [node.split("_")[:2] for node in nodes]
+        frequency_dict = collections.Counter([" ".join(node) for node in nodes])
+    elif parcellation_name == "AAL":
+        nodes = parcel_approach[parcellation_name]["nodes"]
+        frequency_dict = collections.Counter([node.split("_")[0] for node in nodes])
+    else:
+        frequency_dict = {}
+        for names_id in columns:
+            # For custom, columns comes in the form of "Hemisphere Region"
+            hemisphere_id = "LH" if names_id.startswith("LH ") else "RH"
+            region_id = re.split("LH |RH ", names_id)[-1]
+            node_indices = parcel_approach["Custom"]["regions"][region_id][hemisphere_id.lower()]
+            frequency_dict.update({names_id: len(node_indices)})
+
+    # Get the names, which indicate the hemisphere and region
+    # Reverting Counter objects to list retains original ordering of nodes in list as of Python 3.7
+    names_list = list(frequency_dict)
+    labels = ["" for _ in range(0,len(parcel_approach[parcellation_name]["nodes"]))]
+
+    starting_value = 0
+
+    # Iterate through names_list and assign the starting indices corresponding to unique region and hemisphere key
+    for num, name in enumerate(names_list):
+        if num == 0:
+            labels[0] = name
+        else:
+            # Shifting to previous frequency of the preceding network to obtain the new starting value of
+            # the subsequent region and hemisphere pair
+            starting_value += frequency_dict[names_list[num-1]]
+            labels[starting_value] = name
+
+    return labels, names_list
diff --git a/neurocaps/_utils/_cap_internals/_run_kmeans.py b/neurocaps/_utils/_cap_internals/_run_kmeans.py
@@ -1,14 +1,21 @@
 """Internal function for performing silhouette or elbow method with or without multiprocessing"""
 from sklearn.cluster import KMeans
-from sklearn.metrics import silhouette_score
+from sklearn.metrics import davies_bouldin_score, calinski_harabasz_score, silhouette_score
 
 def _run_kmeans(n_cluster, random_state, init, n_init, max_iter, tol, algorithm, concatenated_timeseries, method):
     model = KMeans(n_clusters=n_cluster, random_state=random_state, init=init, n_init=n_init, max_iter=max_iter,
                    tol=tol, algorithm=algorithm).fit(concatenated_timeseries)
-    if method == "silhouette":
-        cluster_labels = model.labels_
+
+    cluster_labels = model.labels_
+
+    if method == "davies_bouldin":
+        performance = {n_cluster: davies_bouldin_score(concatenated_timeseries, cluster_labels)}
+    elif method == "elbow":
+        performance = {n_cluster: model.inertia_}
+    elif method == "silhouette":
         performance = {n_cluster: silhouette_score(concatenated_timeseries, cluster_labels, metric="euclidean")}
     else:
-        performance = {n_cluster: model.inertia_}
+        # Variance Ratio
+        performance = {n_cluster: calinski_harabasz_score(concatenated_timeseries, cluster_labels)}
 
     return performance
diff --git a/neurocaps/analysis/cap.py b/neurocaps/analysis/cap.py
diff --git a/neurocaps/extraction/timeseriesextractor.py b/neurocaps/extraction/timeseriesextractor.py
diff --git a/tests/test_CAP.py b/tests/test_CAP.py