Skip to content

Example of PyTorch project for train, test, inference, monitor some ML model

Notifications You must be signed in to change notification settings

Vladimetr/PyTorchMLProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EXAMPLE USAGE

  1. Train
python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --log-step 5 \
    --comment "my-train"
  1. Train with no saving
python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --no-save
  1. Train specific experiment with comment
python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --experiment 'my_experiment' \
    --comment 'example-run'
  1. Train experiment with manager

4.1 Train experiment with MLFlow

python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --experiment 'my_experiment' \
    --mlflow

NOTE: this experiment must be created in MLFlow Server
NOTE: add -v $PWD/dev/mlflow/data:/mlflow/mlruns to running container which is used for train.py and test.py

4.2. Train experiment with ClearML

python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --experiment 'my_experiment' \
    --clearml

NOTE: Credentials (access and secret keys) must be created in UI. And /home/<user>/clearml.conf must be defined:

api { 
    web_server: http://localhost:8080
    api_server: http://localhost:8008
    files_server: http://localhost:8081
    credentials {
        "access_key" = <ACCESS_KEY>
        "secret_key"  = <SECRET_KEY>
    }
}

4.3. Train experiment with MLFlow and Tensorboard

python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --experiment 'my_experiment' \
    --mlflow \
    --tensorboard

4.4. Train experiment with ClearML and Tensorboard

python3 -m torchproject.train \
    --train-data data/train_manifest.csv \
    --test-data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --experiment 'my_experiment' \
    --clearml \
    --tensorboard
  1. Test
python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --log-step 1 \
    --comment "my-test"
  1. Test with no saving
python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --epochs 15 \
    --no-save
  1. Test with reference to train experiment run
python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --run 3

NOTE: 'my_experiment/train/003*' must be created

  1. Test with reference to train experiment run and specific weights
python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --run 3 \
    --weights 5.pt
  1. Test with manager

11.1. Test with MLFlow

python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --run 3 \
    --weights 5.pt \
    --mlflow

11.2. Test with ClearML

python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --run 3 \
    --weights 5.pt \
    --clearml

NOTE: make things described in 4.2

11.3. Test with MLFlow and Tensorboard

python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --run 3 \
    --weights 5.pt \
    --mlflow \
    --tensorboard

11.4. Test with ClearML and Tensorboard

python3 -m torchproject.test \
    --data data/test_manifest.csv \
    --config config.yaml \
    --batch-size 50 \
    --experiment 'my_experiment' \
    --run 3 \
    --weights 5.pt \
    --clearml \
    --tensorboard
  1. Generate Tensorboard logs from metrics in csv
python3 -m torchproject.utils -r 'my_experiment/test/001'

NOTE: 'my_experiment/test/001*' must be created

About

Example of PyTorch project for train, test, inference, monitor some ML model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages