Skip to content

Conversation

@leemc-data-ed
Copy link
Contributor

New module on clinical data in Arcus is ready for initial review. Keeping in draft mode for now.

@leemc-data-ed leemc-data-ed requested a review from rosemm June 6, 2025 17:26
Copy link
Collaborator

@rosemm rosemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, here's a review on the right PR this time :)


## Clinical data journey to the ADR

![Journey of clinical data at CHOP, beginning in Epic Chronicles, flowing into Epic Clarity, and from there into two branches, the CHOP Data Warehouse or the Arcus Data Repository.](media/chop_clinical_data_overview_updated.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not referencing the CDW anymore, I thought?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yep correct, missed this one!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have an updated graphic I just forgot to copy to the media folder... but I think I like yours better for this anyway!


CHOP stores a _vast_ amount of clinical data, and it is incredibly complex! For efficient storage and access, data from Epic Chronicles is uploaded to Clarity (a SQL database) nightly. This database is where the ADR gets its data. The ADR documentation in the CHOP data catalog contains information about the lineage of the data; [check out this lineage of the contact date field in the encounter table](https://chop.alationcloud.com/attribute/933634/lineage/) as an example.

While Epic does have some data analysis tools, they are not built with research as the primary focus. The ADR _is_ designed for research, and because it is a curated list, it is much easier to find and deliver exactly what you need to answer your research question.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're contrasting Hyperspace (slicerdicer, etc.) here with ADR, but we might want to make that explicit and also contrast ADR with Clarity and/or Helix, maybe as a separate point.


You can read more about the ADR and look at some of the metadata about the tables in the ADR in the [CHOP Data Catalog](https://chop.alationcloud.com/data/23/).

The next few sections will go through two of the central tables in the ADR, to which many others connect: Patients and Encounters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these next few sections are probably the most valuable piece for folks reading this content. I expect the two big questions folks will have in their mind for this content are "what data are available in the ADR" and "where do I find X in the ADR". To that end, we may want to significantly expand this section, to cover more than just patients and encounters, although doing so would be a big lift and we may not have bandwidth.

Here's the suggestion I had put on the PR in the notebooks repo, which I think applies here as well:

Instead of an ERD, what about a few examples that start with a general research question and then point to where those data would be? For example:

incidence of asthma dx for pts who have vs have not received covid vaccine (where is dx info stored? where is vaccine history?)
impact of SODH (e.g. child opportunity index) on rehospitalization after surgery (where is geospatial info? where are surgery encounters? where are hospitalizations?)
comparison of over- vs. under-weight BMI in referrals to nutrition specialists (where is BMI? where are referrals?)
For each example, I think the "where are the data" would be just a the level of table: "BMI for a given encounter is in flowsheets, as are a lot of other common measurements like vital signs. A surgery is an encounter, as is a hospitalization; you can filter by encounter type. And for inpatient encounters, there's lots of additional information in the ADT tables. etc." but no more detailed than that.
And also: I personally don't know with confidence where to find all these data points, so I'm definitely not expecting that you would either! :) But if you like this approach, I'd be happy to work with you on fleshing out some examples like these and get the ADR team to vet.


These tables contain additional information related to specific encounters, such as admissions or diagnoses. For more information, check out the [ADR in the CHOP data catalog](https://chop.alationcloud.com/data/23/).

### Encounter: Entity relationship diagram
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how helpful these diagrams really are... if they were interactive (click on a table and then it centers on that table, etc.) like the diagrams in Caboodle, I think it would be a lot more valuable, and then it would also potentially provide a way for you to sort of "walk through" the full data model, by clicking from one table to the next and seeing how they all can potentially connect.
But it's a little weird to me that the ERD here isn't complete -- they don't show all the tables you can link to from encounter, just a few examples, so it's not providing a birdseye view of the ADR's structure. I dunno. What do you think? I think maybe what I would prefer would be for there to be a complete ERD for every single blessed table in the ADR, and for the ERD to be included on the Gene page for that table. But to me, I think showing an incomplete ERD just underscores that it's a relational database -- you can join tables together by keys -- it doesn't necessarily communicate the data model itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're probably right--I actually don't really like them and don't find them helpful, but I do wish we had more visuals that would be useful. We can play around with other options though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could include screenshots of the cohort discovery tool and/or clinical data finder, potentially.

@leemc-data-ed
Copy link
Contributor Author

Okay @rosemm I've made some changes based on your suggestions! I've also added a new section, "So how do I find what I need?", that's the placeholder for what I think you were looking to add. Feel free to change the name or add your content elsewhere!

@rosemm
Copy link
Collaborator

rosemm commented Nov 5, 2025

@leemc-data-ed lemme know what you think! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants