ML-guided-material-synthesis

By Bijun Tang, Yuhao Lu, Jiadong Zhou, Tushar Chouhan, Han Wang, Prafful Golani, Manzhang Xu, Quan Xu, Cuntai Guan, Zheng Liu

Nanyang Technological University.

Introduction

This repository contains the original models described in the paper "Machine learning-guided synthesis of advanced inorganic materials" (https://arxiv.org/abs/1905.03938). These models are those used for MoS2 classification task as well as CQD regression task.

Citation

If you use these models in your research, please cite:

@article{Tang2019,
	author = {Bijun Tang, Yuhao Lu, Jiadong Zhou, Han Wang, Prafful Golani, Manzhang Xu, Quan Xu, Cuntai Guan, Zheng Liu},
	title = {Machine learning-guided synthesis of advanced inorganic materials},
	journal = {arXiv preprint arXiv:1905.03938},
	year = {2019}
}

Environment Setup

Python environment setup:

python 3.6.6
jupyter==1.0.0
matplotlib==2.2.3
numpy==1.16.0
pandas==0.22.0
scikit-learn==0.20.3
scipy==1.1.0
seaborn==0.9.0
shap==0.24.0
xgboost==0.80

In case of errors during setup, check out your installation of the following packages in Ubuntu or other Linux-based systems may help:
```
font-manager
g++
gcc
python3-dev
```
Or, upgrade your pip.

Data

Before running best_model_interpretation-*.ipynb, use utils.data_handler.fake_input_generator() to generate the input conditions. Then move the generated fake_input_*.csv into data folder.
For more detailed description of the dataset, please check out our paper.

Code

Code structure:
- scripts
  - run_ipynb.sh : script to run all *.ipynb. Setup up your directory in file.
- results : folder to store all results and generated figures
- utils : supporting functions
- data : download data before running code (see Data)
- PAM_repeat1000times-*.py : to repeat 1000 times of PAM with randomly selected initial training sets. Please take note that it takes considerable computational time to finish running all 1000 times. E.g it may take around 1 hr to run 1 time of PAM for classification.
- PAM_guidedSynthesis-*.ipynb : to run 1 time of PAM and plot the figures
- model_selection-*.ipynb : to select best model with 10 repetitions of 10 X 10 cross validation; plus result interpretation
- data_overview.ipynb : to plot feature correlation of dataset, and compute other descriptive statistics
- best_model_interpretation-*.ipynb : to extract feature attribution values; and predict the generated input

Note: File names end with '-classification' are for classification or MoS2 dataset, while those end with '-regression' are for regression or CQD dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
results		results
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
PAM_guidedSynthesis-classification.ipynb		PAM_guidedSynthesis-classification.ipynb
PAM_guidedSynthesis-regression.ipynb		PAM_guidedSynthesis-regression.ipynb
PAM_repeat1000times-classification.py		PAM_repeat1000times-classification.py
PAM_repeat1000times-regression.py		PAM_repeat1000times-regression.py
README.md		README.md
best_model_interpretation-classification.ipynb		best_model_interpretation-classification.ipynb
best_model_interpretation-regression.ipynb		best_model_interpretation-regression.ipynb
data_overview.ipynb		data_overview.ipynb
model_selection-classification.ipynb		model_selection-classification.ipynb
model_selection-regression.ipynb		model_selection-regression.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-guided-material-synthesis

Table of Contents

Introduction

Citation

Environment Setup

Data

Code

About

Releases

Packages

Languages

License

MSwML/ML-guided-material-synthesis

Folders and files

Latest commit

History

Repository files navigation

ML-guided-material-synthesis

Table of Contents

Introduction

Citation

Environment Setup

Data

Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages