Add eICU LLM synthetic mortality example #629
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Who I am:
Mohsin Shah (NetID: mohsins2)
Contribution type:
New example / use case of PyHealth.
High-level description:
This PR adds an example script examples/eicu_llm_synthetic_mortality.py which demonstrates how to use PyHealth with small tabular EHR-style data and LLM-generated synthetic cohorts from my CS598 project.
The example uses 10 hand-crafted ICU features and compares three training regimes for ICU mortality prediction:
Real train → Real test
GPT baseline synthetic train → Real test
GPT privacy-aware synthetic train → Real test
How to run / what files to look at
Example script: examples/eicu_llm_synthetic_mortality.py
The script directly loads the CSVs from my project repo via raw GitHub URLs:
https://github.com/mohsinposts/CS598-DLH-LLM-eICU
(real_icu_10feat.csv, synthetic_baseline_10feat_clean.csv, synthetic_privacy_10feat_clean.csv)
From the PyHealth repo root, run:
python examples/eicu_llm_synthetic_mortality.py
This prints ROC-AUC, PR-AUC, accuracy, F1, and loss.