VACLab · hyi · Oct 2, 2025 · Oct 1, 2025 · Oct 2, 2025
diff --git a/README.md b/README.md
@@ -106,10 +106,11 @@ that enables users to explore union of concept prevalences over multiple cohorts
 selection bias exploration. An example code snippet is shown below to illustrate how to use this method.
    ```angular2html
    cohort_list = [baseline_cohort_data.cohort_id, study_cohort_data.cohort_id]
-   aggregated_cohort_metrics_dict = bias.get_cohorts_concept_stats(cohort_list)
-   print('Aggregated concept prevalence metrics over the baseline and study cohorts are:')
-   print(aggregated_cohort_metrics_dict)
+   union_cohort_concept_hierarchy_dict = bias.get_cohorts_concept_stats(cohort_list)
+   print('Concept hierarchy with prevalence metrics unionized across the baseline and study cohorts are:')
+   print(union_cohort_concept_hierarchy_dict)
    ```
+  For more details, refer to the corresponding tutorial notebook [BiasAnalyzerMultipleCohortConceptUnionTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerMultipleCohortConceptUnionTutorial.ipynb).
 - There is also an API method that enables users to compare distributions of two cohorts by calling `bias.compare_cohorts(cohort1_id, cohort2_id)` 
 where cohort1_id and cohort2_id are integers and can be obtained from metadata of a cohort object. Currently, 
 only hellinger distances between distributions of two cohorts are computed.
@@ -125,11 +126,12 @@ To help users get started with the `BiasAnalyzer` python package, four Jupyter n
 provided in the [`notebooks/`](https://github.com/VACLab/BiasAnalyzer/tree/main/notebooks) 
 directory. These tutorials walk users through key features and workflows with illustrative examples.
 
-| Tutorial | Description                                                                                                                                                                                                           |
-|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [BiasAnalyzerCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortsTutorial.ipynb) | Demonstrates how to create baseline and study cohorts, retrieve cohort statistics, and compare cohort distributions.                                                                                                  |
-| [BiasAnalyzerAsyncCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerAsyncCohortsTutorial.ipynb) | As a companion to the Cohort tutorial above, demonstrates how to create and analyze cohorts asynchronously for improved performance and responsiveness when working with large datasets or complex cohort definitions. |
-| [BiasAnalyzerCohortConceptTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortConceptTutorial.ipynb) | Demonstrates how to explore clinical concept prevalence within a cohort, helping users analyze clinical concept prevalence and identify potential cohort selection biases.                                            |
-| [BiasAnalyzerConceptBrowsingTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerConceptBrowsingTutorial.ipynb) | Guides users through browsing OMOP concepts, domains, and vocabularies, including how to retrieve and visualize concept hierarchies.                                                                                  |
+| Tutorial | Description                                                                                                                                                                                                                      |
+|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [BiasAnalyzerCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortsTutorial.ipynb) | Demonstrates how to create baseline and study cohorts, retrieve cohort statistics, and compare cohort distributions.                                                                                                             |
+| [BiasAnalyzerAsyncCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerAsyncCohortsTutorial.ipynb) | As a companion to the Cohort tutorial above, demonstrates how to create and analyze cohorts asynchronously for improved performance and responsiveness when working with large datasets or complex cohort definitions.           |
+| [BiasAnalyzerCohortConceptTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortConceptTutorial.ipynb) | Demonstrates how to explore clinical concept prevalence within a cohort, helping users analyze clinical concept prevalence and identify potential cohort selection biases.                                                       |
+| [BiasAnalyzerMultipleCohortConceptUnionTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortConceptTutorial.ipynb) | Demonstrates how to explore clinical concept prevalence across multiple cohorts, helping users analyze clinical concept prevalence hierarchies unionized across multiple cohorts and identify potential cohort selection biases. |
+| [BiasAnalyzerConceptBrowsingTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerConceptBrowsingTutorial.ipynb) | Guides users through browsing OMOP concepts, domains, and vocabularies, including how to retrieve and visualize concept hierarchies.                                                                                             |
 
 These tutorials are designed to run in a Jupyter environment with access to an OMOP-compatible postgreSQL or DuckDB database. 
diff --git a/biasanalyzer/concept.py b/biasanalyzer/concept.py
@@ -24,6 +24,11 @@ def parents(self) -> List["ConceptNode"]:
     def children(self) -> List["ConceptNode"]:
         return [ConceptNode(c, self._ch) for c in self._ch.graph.successors(self.id)]
 
+    def source_cohorts(self) -> List[int]:
+        """Return sorted list of cohort identifier strings the node appears in."""
+        metrics = self._ch.graph.nodes[self.id].get("metrics", {})
+        return sorted(int(k) for k in metrics.keys())
+
     def get_metrics(self, cohort_id: Union[int, str]) -> dict:
         metrics = self._ch.graph.nodes[self.id].get("metrics", {})
         return metrics.get(str(cohort_id), {})
@@ -38,22 +43,27 @@ def get_union_metrics(self) -> dict:
             "prevalence": sum(prevalences) / len(prevalences) if prevalences else 0.0,
         }
 
-    def to_dict(self, include_children: bool = True) -> dict:
+    def to_dict(self, include_children: bool = True, include_union_metrics: bool = False) -> dict:
         """
         Serialize this node into a dict. Optionally include nested children.
+        Set include_union_metrics to True to compute an aggregated union metric
         """
+        node_metrics = self._ch.graph.nodes[self.id].get("metrics", {}).copy()
+        if include_union_metrics:
+            node_metrics = {"union": self.get_union_metrics(), **node_metrics}
+
         data = {
             "concept_id": self.id,
             "concept_name": self.name,
             "concept_code": self.code,
-            "metrics": {
-                "union": self.get_union_metrics(),
-                "cohorts": self._ch.graph.nodes[self.id].get("metrics", {}),
-            },
+            "metrics": node_metrics,
+            "source_cohorts": self.source_cohorts(),
             "parent_ids": list(self._ch.graph.predecessors(self.id)),
         }
         if include_children:
-            data["children"] = [c.to_dict(include_children=True) for c in self.children]
+            data["children"] = [c.to_dict(include_children=True, include_union_metrics=include_union_metrics)
+                                for c in self.children]
+
         return data
 
 
@@ -190,16 +200,20 @@ def union(self, other: "ConceptHierarchy") -> "ConceptHierarchy":
         ConceptHierarchy._graph_cache[new_ident] = new_hierarchy
         return new_hierarchy
 
-    def to_dict(self, root_id: Optional[int] = None) -> dict:
+    def to_dict(self, root_id: Optional[int] = None, include_union_metrics: bool = False) -> dict:
         """
         Convert the concept hierarchy or a sub-hierarchy to a nested dict structure
         :param root_id: if provided, return the sub-hierarchy rooted at this concept_id;
         if None, return the whole hierarchy with all roots.
-        :return: nested dict representation of the hierarchy or sub-hierarchy
+        :return: nested dict representation of the hierarchy or sub-hierarchy.
+        By default, include per-cohort metrics only. Set include_union_metrics=True to compute and include
+        union aggregates
         """
         if root_id is not None:
             if root_id not in self.graph:
                 raise ValueError(f"Input concept id {root_id} not found in the concept hierarchy graph")
-            return {"hierarchy": [ConceptNode(root_id, self).to_dict()]}
+            return {"hierarchy": [ConceptNode(root_id, self).to_dict(include_children=True,
+                                                                     include_union_metrics=include_union_metrics)]}
 
-        return {"hierarchy": [r.to_dict() for r in self.get_root_nodes()]}
+        return {"hierarchy": [r.to_dict(include_children=True, include_union_metrics=include_union_metrics)
+                              for r in self.get_root_nodes()]}