Skip to content

Conversation

@willhunnius
Copy link

Synthetic EHR Data Evaluation Example

Description

This PR adds a comprehensive example demonstrating how to evaluate synthetic Electronic Health Record (EHR) data quality using PyHealth.

Contribution Type

  • New example/use case

Based On

Lin et al. (2025) "A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs" - JMLR CHIL 2025
https://arxiv.org/abs/2504.14657

Files Added

  • examples/synthetic_ehr_evaluation.py - Main example script

Features

  • Fidelity Metrics: KL divergence for distribution matching
  • Utility Metrics: TSTR (Train-Synthetic, Test-Real) evaluation
  • Privacy Metrics: Membership inference attack evaluation
  • Visualizations: Distribution comparisons, ROC curves, summary charts

Usage

evaluator = SyntheticDataEvaluator(target_column="mortality")
results = evaluator.evaluate(real_data, synthetic_data)
print(f"TSTR AUC: {results['utility']['tstr_auc']:.3f}")

Course Information

  • Course: CS598 DL4H - Deep Learning for Healthcare
  • University: University of Illinois Urbana-Champaign
  • Authors: Will Hunnius, Jiesen Zhang

Checklist

  • Code follows PEP8 style guidelines
  • Added docstrings with Google style
  • Example runs without errors
  • Notebook is well-documented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant