Skip to content

chenchenpan/AutoML-Toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoML-Toolkit

An internship project in 2020. Updated in 2023.

Dependencies

Quick start

We provide a demo data to help user quick try out this toolkit. The demo data is using a small portion of Kickstarters.com real data which is collected from Webrobots.io.

Step 1: Setup Azure instance

Start a Data Science Virtual Machine (DSVM). In this repo, we are using Ubuntu 18.04 (Linux VM). For more details about creating the VM, please reference Creare an Ubuntu Data Science Virtual Machine.

Open port (for example, 6000-6010) in the security group for tensorboard. You may find this blog helpful.

ssh into the instance.

Step 2: Download the data and install the dependencies

mkdir ~/projects
cd ~/projects/

cd ~/projects/
git clone https://github.com/chenchenpan/AutoML-Toolkit.git

cd ~/projects/AutoML-Toolkit/
chmod 777 vm_setup.sh
./vm_setup.sh

Step 3: Running experiments and monitor with tensorboard

Start NN model with GloVe experiment

cd ~/projects/AutoML-Toolkit/
mkdir resource
mkdir resource/glove
cd ~/projects/AutoML-Toolkit/resource/glove
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
rm glove.6B.zip

screen -S demo
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/demo

chmod 777 run_nn_experiment.sh
./run_nn_experiment.sh

Start BERT-Tiny experiment

screen -S bert
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/resource
mkdir bert
wget https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-2_H-128_A-2.zip
unzip uncased_L-2_H-128_A-2.zip
rm uncased_L-2_H-128_A-2.zip

cd ~/projects/AutoML-Toolkit/demo
chmod 777 run_bert_experiment.sh
./run_bert_experiment.sh

More BERT models can be found here. All the models and results about the experiments will be saved in ~/projects/AutoML-Toolkit/demo/outputs.

Start tensorboard to monitor experiment

screen -S tb
source ~/project/envs/lab/bin/activate
cd  ~/projects/AutoML-Toolkit/demo
tensorboard --logdir=outputs

To see the tensorboard, in the browser, go to [your AzureVM public DNS]:6006 (make sure you add the inbound port). Alternatively, you can monitor your models via local ports. On the remote machine, let's choose port number 8008 and run:

tensorboard --logdir=outputs --port=8008

From your local machine, set up ssh port forwarding to one of your unused local ports, for example port 8898:

ssh -NfL localhost:8898:localhost:8008 user@remote

Finally, go to localhost:8898 on your local web browser. The tensorboard interface should pop up.

Start with your own dataset

Following 3 steps, you can easily use this toolkit to train your own machine learning and deep learning models with any datasets.

Step 1: Upload prepared datasets

To keep the same organized structure, you can easily copy the demo folder and rename it as my_project.

Navigate to my_project/data/raw_data directory, replace the comb_train.tsv, comb_dev.tsv and comb_test.tsv under with your training, validation and test datasets, and delete all the .json files (you will create your own metadata file in step 2).

Optionally, you can delete all the files in my_project/search_space directory, since you will define the search space and generate these files in step 2.

Step 2: Generate metadata and search space files

As an input, you need to provide a metadata.json file which describes the datasets structure and data types.

Our toolkit also provides hyperparameter tuning function, so it will need a search_space.json file which defines the search space for hyperparameter tuning.

Open define_metadata_and_search_space.ipynb, follow it step by step, and you will easily generate these two files. We provide some examples and make sure it bug-free. You can modify it based on your data and models.

Step 3: Repeat step 3 above in Quick Start

After modifying some arguments (such as DIR) in .sh files, you can follow the step 3 above in Quick Start to run and monitor the experiments!

About

An internship project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •