Mrinal Kanti Baowaly, Chia-Ching Lin, Chao-Lin Liu, Kuan-Ta Chen
Mrinal Kanti Baowaly, Chia-Ching Lin, Chao-Lin Liu, Kuan-Ta Chen, Synthesizing electronic health records using improved generative adversarial networks, Journal of the American Medical Informatics Association, Volume 26, Issue 3, March 2019, Pages 228–241, https://doi.org/10.1093/jamia/ocy142
The goal of this research is to generate synthetic electronic health records (EHRs) using two improved Generative Adversarial Networks: Wasserstein GAN with gradient penalty (WGAN-GP) and Boundary-seeking GAN (BGAN ). We defined the two models as medWGAN and medBGAN respectively.
The generated EHRs will be more realistic than the existing works (e.g. medGAN) and these data will be free of legal, security and privacy concerns
model.py
defines the MEDGAN
, MEDWGAN
, and MEDBGAN
classes, which will be imported in train.py
to build the neural network for GAN training and EHR generation.
Install 'Tensorflow' and download/clone the source code model.py
and train.py
Download the MIMIC-III data, aggregate the medical codes (e.g. diagonsis codes, medication codes, or procedure codes) for each patient, and save them as an numpy data file (.npy file)
- Usage:
$ python train.py --data_file [path to the training data (npy format)] --n_pretrain_epoch 100 --n_epoch 1000
- During training, a progress bar will be showed for each epoch. Also, a folder will be created (
medGAN
by default) and two subfoldersmodel
andoutput
will be created therein, with the former containing model checkpoints and the latter containing the synthetic EHR data (calledgenerated.npy
by default). - To specify output folder name, add parameter
--model [model_name]
, where [model_name] ismedGAN
with any prefix or postfix, such asmedGAN_n_epoch_500
. - To run improved GAN, add parameter
--model medWGAN
or--model medBGAN
. Again,medWGAN
andmedBGAN
can also have any prefix or postfix. - For more parameters, please refer to the source code in
train.py
.