Chest X-rays

Downloading the Data

Obtain access to the MIMIC-CXR-JPG Database Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:

gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG

Sign up with your email address here.
Download either the original or the downsampled dataset (we recommend the downsampled version - CheXpert-v1.0-small.zip) and extract it.

Download the images folder and Data_Entry_2017_v2020.csv from the NIH website.
Unzip all of the files in the images folder.

In Constants.py, update image_paths to point to each of the three directories that you downloaded.
Run python -m data.preprocess.preprocess_cxr.
(Optional) If you are training a lot of models, it might be faster to first cache all images to binary 224x224 files on disk. In this case, you should update the cache_dir path in Constants.py and then run python -m data.preprocess.cache_data, optionally parallelizing over --env_id {0, 1, 2} for speed. To use the cached files, pass --cache_cxr to train.py.