Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,11 @@ that enables users to explore union of concept prevalences over multiple cohorts
selection bias exploration. An example code snippet is shown below to illustrate how to use this method.
```angular2html
cohort_list = [baseline_cohort_data.cohort_id, study_cohort_data.cohort_id]
aggregated_cohort_metrics_dict = bias.get_cohorts_concept_stats(cohort_list)
print('Aggregated concept prevalence metrics over the baseline and study cohorts are:')
print(aggregated_cohort_metrics_dict)
union_cohort_concept_hierarchy_dict = bias.get_cohorts_concept_stats(cohort_list)
print('Concept hierarchy with prevalence metrics unionized across the baseline and study cohorts are:')
print(union_cohort_concept_hierarchy_dict)
```
For more details, refer to the corresponding tutorial notebook [BiasAnalyzerMultipleCohortConceptUnionTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerMultipleCohortConceptUnionTutorial.ipynb).
- There is also an API method that enables users to compare distributions of two cohorts by calling `bias.compare_cohorts(cohort1_id, cohort2_id)`
where cohort1_id and cohort2_id are integers and can be obtained from metadata of a cohort object. Currently,
only hellinger distances between distributions of two cohorts are computed.
Expand All @@ -125,11 +126,12 @@ To help users get started with the `BiasAnalyzer` python package, four Jupyter n
provided in the [`notebooks/`](https://github.com/VACLab/BiasAnalyzer/tree/main/notebooks)
directory. These tutorials walk users through key features and workflows with illustrative examples.

| Tutorial | Description |
|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [BiasAnalyzerCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortsTutorial.ipynb) | Demonstrates how to create baseline and study cohorts, retrieve cohort statistics, and compare cohort distributions. |
| [BiasAnalyzerAsyncCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerAsyncCohortsTutorial.ipynb) | As a companion to the Cohort tutorial above, demonstrates how to create and analyze cohorts asynchronously for improved performance and responsiveness when working with large datasets or complex cohort definitions. |
| [BiasAnalyzerCohortConceptTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortConceptTutorial.ipynb) | Demonstrates how to explore clinical concept prevalence within a cohort, helping users analyze clinical concept prevalence and identify potential cohort selection biases. |
| [BiasAnalyzerConceptBrowsingTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerConceptBrowsingTutorial.ipynb) | Guides users through browsing OMOP concepts, domains, and vocabularies, including how to retrieve and visualize concept hierarchies. |
| Tutorial | Description |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [BiasAnalyzerCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortsTutorial.ipynb) | Demonstrates how to create baseline and study cohorts, retrieve cohort statistics, and compare cohort distributions. |
| [BiasAnalyzerAsyncCohortsTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerAsyncCohortsTutorial.ipynb) | As a companion to the Cohort tutorial above, demonstrates how to create and analyze cohorts asynchronously for improved performance and responsiveness when working with large datasets or complex cohort definitions. |
| [BiasAnalyzerCohortConceptTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortConceptTutorial.ipynb) | Demonstrates how to explore clinical concept prevalence within a cohort, helping users analyze clinical concept prevalence and identify potential cohort selection biases. |
| [BiasAnalyzerMultipleCohortConceptUnionTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerCohortConceptTutorial.ipynb) | Demonstrates how to explore clinical concept prevalence across multiple cohorts, helping users analyze clinical concept prevalence hierarchies unionized across multiple cohorts and identify potential cohort selection biases. |
| [BiasAnalyzerConceptBrowsingTutorial.ipynb](https://github.com/VACLab/BiasAnalyzer/blob/main/notebooks/BiasAnalyzerConceptBrowsingTutorial.ipynb) | Guides users through browsing OMOP concepts, domains, and vocabularies, including how to retrieve and visualize concept hierarchies. |

These tutorials are designed to run in a Jupyter environment with access to an OMOP-compatible postgreSQL or DuckDB database.
34 changes: 24 additions & 10 deletions biasanalyzer/concept.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ def parents(self) -> List["ConceptNode"]:
def children(self) -> List["ConceptNode"]:
return [ConceptNode(c, self._ch) for c in self._ch.graph.successors(self.id)]

def source_cohorts(self) -> List[int]:
"""Return sorted list of cohort identifier strings the node appears in."""
metrics = self._ch.graph.nodes[self.id].get("metrics", {})
return sorted(int(k) for k in metrics.keys())

def get_metrics(self, cohort_id: Union[int, str]) -> dict:
metrics = self._ch.graph.nodes[self.id].get("metrics", {})
return metrics.get(str(cohort_id), {})
Expand All @@ -38,22 +43,27 @@ def get_union_metrics(self) -> dict:
"prevalence": sum(prevalences) / len(prevalences) if prevalences else 0.0,
}

def to_dict(self, include_children: bool = True) -> dict:
def to_dict(self, include_children: bool = True, include_union_metrics: bool = False) -> dict:
"""
Serialize this node into a dict. Optionally include nested children.
Set include_union_metrics to True to compute an aggregated union metric
"""
node_metrics = self._ch.graph.nodes[self.id].get("metrics", {}).copy()
if include_union_metrics:
node_metrics = {"union": self.get_union_metrics(), **node_metrics}

data = {
"concept_id": self.id,
"concept_name": self.name,
"concept_code": self.code,
"metrics": {
"union": self.get_union_metrics(),
"cohorts": self._ch.graph.nodes[self.id].get("metrics", {}),
},
"metrics": node_metrics,
"source_cohorts": self.source_cohorts(),
"parent_ids": list(self._ch.graph.predecessors(self.id)),
}
if include_children:
data["children"] = [c.to_dict(include_children=True) for c in self.children]
data["children"] = [c.to_dict(include_children=True, include_union_metrics=include_union_metrics)
for c in self.children]

return data


Expand Down Expand Up @@ -190,16 +200,20 @@ def union(self, other: "ConceptHierarchy") -> "ConceptHierarchy":
ConceptHierarchy._graph_cache[new_ident] = new_hierarchy
return new_hierarchy

def to_dict(self, root_id: Optional[int] = None) -> dict:
def to_dict(self, root_id: Optional[int] = None, include_union_metrics: bool = False) -> dict:
"""
Convert the concept hierarchy or a sub-hierarchy to a nested dict structure
:param root_id: if provided, return the sub-hierarchy rooted at this concept_id;
if None, return the whole hierarchy with all roots.
:return: nested dict representation of the hierarchy or sub-hierarchy
:return: nested dict representation of the hierarchy or sub-hierarchy.
By default, include per-cohort metrics only. Set include_union_metrics=True to compute and include
union aggregates
"""
if root_id is not None:
if root_id not in self.graph:
raise ValueError(f"Input concept id {root_id} not found in the concept hierarchy graph")
return {"hierarchy": [ConceptNode(root_id, self).to_dict()]}
return {"hierarchy": [ConceptNode(root_id, self).to_dict(include_children=True,
include_union_metrics=include_union_metrics)]}

return {"hierarchy": [r.to_dict() for r in self.get_root_nodes()]}
return {"hierarchy": [r.to_dict(include_children=True, include_union_metrics=include_union_metrics)
for r in self.get_root_nodes()]}
Loading