AutoML-Toolkit

An internship project in 2020. Updated in 2023.

Dependencies

Python 3.0
TensorFlow>=2.0
Other required packages are summarized in requirements.txt.
To enable the GPU when running DL model, you may find these articles are helpful:
- CUDA Install
- The Ultimate Guide: Ubuntu 18.04 GPU Deep Learning Installation

Quick start

We provide a demo data to help user quick try out this toolkit. The demo data is using a small portion of Kickstarters.com real data which is collected from Webrobots.io.

Step 1: Setup Azure instance

Start a Data Science Virtual Machine (DSVM). In this repo, we are using Ubuntu 18.04 (Linux VM). For more details about creating the VM, please reference Creare an Ubuntu Data Science Virtual Machine.

Open port (for example, 6000-6010) in the security group for tensorboard. You may find this blog helpful.

ssh into the instance.

Step 2: Download the data and install the dependencies

mkdir ~/projects
cd ~/projects/

cd ~/projects/
git clone https://github.com/chenchenpan/AutoML-Toolkit.git

cd ~/projects/AutoML-Toolkit/
chmod 777 vm_setup.sh
./vm_setup.sh

Step 3: Running experiments and monitor with tensorboard

Start NN model with GloVe experiment

cd ~/projects/AutoML-Toolkit/
mkdir resource
mkdir resource/glove
cd ~/projects/AutoML-Toolkit/resource/glove
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
rm glove.6B.zip

screen -S demo
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/demo

chmod 777 run_nn_experiment.sh
./run_nn_experiment.sh

Start BERT-Tiny experiment

screen -S bert
source ~/project/envs/lab/bin/activate
cd ~/projects/AutoML-Toolkit/resource
mkdir bert
wget https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-2_H-128_A-2.zip
unzip uncased_L-2_H-128_A-2.zip
rm uncased_L-2_H-128_A-2.zip

cd ~/projects/AutoML-Toolkit/demo
chmod 777 run_bert_experiment.sh
./run_bert_experiment.sh

More BERT models can be found here. All the models and results about the experiments will be saved in ~/projects/AutoML-Toolkit/demo/outputs.

Start tensorboard to monitor experiment

screen -S tb
source ~/project/envs/lab/bin/activate
cd  ~/projects/AutoML-Toolkit/demo
tensorboard --logdir=outputs

To see the tensorboard, in the browser, go to [your AzureVM public DNS]:6006 (make sure you add the inbound port). Alternatively, you can monitor your models via local ports. On the remote machine, let's choose port number 8008 and run:

tensorboard --logdir=outputs --port=8008

From your local machine, set up ssh port forwarding to one of your unused local ports, for example port 8898:

ssh -NfL localhost:8898:localhost:8008 user@remote

Finally, go to localhost:8898 on your local web browser. The tensorboard interface should pop up.

Start with your own dataset

Following 3 steps, you can easily use this toolkit to train your own machine learning and deep learning models with any datasets.

Step 1: Upload prepared datasets

To keep the same organized structure, you can easily copy the demo folder and rename it as my_project.

Navigate to my_project/data/raw_data directory, replace the comb_train.tsv, comb_dev.tsv and comb_test.tsv under with your training, validation and test datasets, and delete all the .json files (you will create your own metadata file in step 2).

Optionally, you can delete all the files in my_project/search_space directory, since you will define the search space and generate these files in step 2.

Step 2: Generate metadata and search space files

As an input, you need to provide a metadata.json file which describes the datasets structure and data types.

Our toolkit also provides hyperparameter tuning function, so it will need a search_space.json file which defines the search space for hyperparameter tuning.

Open define_metadata_and_search_space.ipynb, follow it step by step, and you will easily generate these two files. We provide some examples and make sure it bug-free. You can modify it based on your data and models.

Step 3: Repeat step 3 above in Quick Start

After modifying some arguments (such as DIR) in .sh files, you can follow the step 3 above in Quick Start to run and monitor the experiments!

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
FHL_big_embedding		FHL_big_embedding
bert		bert
churn_VSB		churn_VSB
demo		demo
README.md		README.md
analysis.ipynb		analysis.ipynb
analysis.py		analysis.py
analysis_Churn.ipynb		analysis_Churn.ipynb
analysis_util.py		analysis_util.py
data_handler.py		data_handler.py
data_handler_debug.ipynb		data_handler_debug.ipynb
encode_bert_data.py		encode_bert_data.py
encode_data.py		encode_data.py
encode_data_test.py		encode_data_test.py
experiments.py		experiments.py
inference.ipynb		inference.ipynb
inference.py		inference.py
inference_with_best_model.py		inference_with_best_model.py
model_selection.py		model_selection.py
modeling.py		modeling.py
modeling_test.py		modeling_test.py
requirements.txt		requirements.txt
test_results.ipynb		test_results.ipynb
vm_setup.sh		vm_setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoML-Toolkit

Dependencies

Quick start

Step 1: Setup Azure instance

Step 2: Download the data and install the dependencies

Step 3: Running experiments and monitor with tensorboard

Start NN model with GloVe experiment

Start BERT-Tiny experiment

Start tensorboard to monitor experiment

Start with your own dataset

Step 1: Upload prepared datasets

Step 2: Generate metadata and search space files

Step 3: Repeat step 3 above in Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

chenchenpan/AutoML-Toolkit

Folders and files

Latest commit

History

Repository files navigation

AutoML-Toolkit

Dependencies

Quick start

Step 1: Setup Azure instance

Step 2: Download the data and install the dependencies

Step 3: Running experiments and monitor with tensorboard

Start NN model with GloVe experiment

Start BERT-Tiny experiment

Start tensorboard to monitor experiment

Start with your own dataset

Step 1: Upload prepared datasets

Step 2: Generate metadata and search space files

Step 3: Repeat step 3 above in Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages