An internship project in 2020. Updated in 2023.
- Python 3.0
- TensorFlow>=2.0
- Other required packages are summarized in
requirements.txt
. - To enable the GPU when running DL model, you may find these articles are helpful:
We provide a demo data to help user quick try out this toolkit. The demo data is using a small portion of Kickstarters.com real data which is collected from Webrobots.io.
Start a Data Science Virtual Machine (DSVM). In this repo, we are using Ubuntu 18.04 (Linux VM). For more details about creating the VM, please reference Creare an Ubuntu Data Science Virtual Machine.
Open port (for example, 6000-6010) in the security group for tensorboard. You may find this blog helpful.
ssh into the instance.
mkdir ~/projects
cd ~/projects/
cd ~/projects/
git clone https://github.com/chenchenpan/AutoML-Toolkit.git
cd ~/projects/AutoML-Toolkit/
chmod 777 vm_setup.sh
./vm_setup.sh
cd ~/projects/AutoML-Toolkit/
mkdir resource
mkdir resource/glove
cd ~/projects/AutoML-Toolkit/resource/glove
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
rm glove.6B.zip
screen -S demo
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/demo
chmod 777 run_nn_experiment.sh
./run_nn_experiment.sh
screen -S bert
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/resource
mkdir bert
wget https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-2_H-128_A-2.zip
unzip uncased_L-2_H-128_A-2.zip
rm uncased_L-2_H-128_A-2.zip
cd ~/projects/AutoML-Toolkit/demo
chmod 777 run_bert_experiment.sh
./run_bert_experiment.sh
More BERT models can be found here.
All the models and results about the experiments will be saved in ~/projects/AutoML-Toolkit/demo/outputs
.
screen -S tb
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/demo
tensorboard --logdir=outputs
To see the tensorboard, in the browser, go to [your AzureVM public DNS]:6006 (make sure you add the inbound port). Alternatively, you can monitor your models via local ports. On the remote machine, let's choose port number 8008 and run:
tensorboard --logdir=outputs --port=8008
From your local machine, set up ssh port forwarding to one of your unused local ports, for example port 8898:
ssh -NfL localhost:8898:localhost:8008 user@remote
Finally, go to localhost:8898
on your local web browser. The tensorboard interface should pop up.
Following 3 steps, you can easily use this toolkit to train your own machine learning and deep learning models with any datasets.
To keep the same organized structure, you can easily copy the demo
folder and rename it as my_project
.
Navigate to my_project/data/raw_data
directory, replace the comb_train.tsv
, comb_dev.tsv
and comb_test.tsv
under with your training, validation and test datasets, and delete all the .json
files (you will create your own metadata file in step 2).
Optionally, you can delete all the files in my_project/search_space
directory, since you will define the search space and generate these files in step 2.
As an input, you need to provide a metadata.json
file which describes the datasets structure and data types.
Our toolkit also provides hyperparameter tuning function, so it will need a search_space.json
file which defines the search space for hyperparameter tuning.
Open define_metadata_and_search_space.ipynb
, follow it step by step, and you will easily generate these two files. We provide some examples and make sure it bug-free. You can modify it based on your data and models.
After modifying some arguments (such as DIR
) in .sh
files, you can follow the step 3 above in Quick Start to run and monitor the experiments!