Skip to content

Conversation

@lokanathdas1989
Copy link

@lokanathdas1989 lokanathdas1989 commented Dec 2, 2025

This contribution:

  1. Adds a new dataset: MIMIC-CXR Database 2.1.0
  2. Implements a new dataset class compliant with PyHealth’s BaseDataset
  3. Adds a Pydantic-validated YAML config
  4. Extracts PATIENTID/STUDYID/FINDINGS/IMPRESSION sections automatically
  5. Adds no breaking changes to PyHealth’s existing datasets
  6. Keeps data access external (MIMIC files must be obtained from PhysioNet through Credentialed Access)

This PR is submitted by the following group of UIUC students :

  1. Lokanath Das (ldas2)
  2. Jared Backofen (jaredb3)
  3. Jacob Ray Fuehne (jfuehne2)

Below Files are introduced as part of the PR :

pyhealth/
├── datasets/
├── mimic_cxr_reports.py # Dataset implementation
├── configs/
│ └── mimic_cxr_reports.yaml # Dataset configuration (Pydantic validated)
├── init.py # Updated the Dataset class relative import
├── tests/
├── test_mimic_cxr_reports.py # Test script for dataset loader
├── docs/
├── README_mimic_cxr_reports.md # Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants