This code is an implementation of the following paper.
AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
- ViT pytorch cifar10 training code
- GPU : NVIDIA GeForce RTX 2080 Ti
- CPU : AMD Ryzen Threadripper 2950X 16-Core Processo
- Conda or Docker
$ make env
$ conda activate 01-vit-pytorch-train
$ make setup
$ python train.py
Files already downloaded and verified
Files already downloaded and verified
Epoch 1/10
----------
Train Epoch: 0
Train Epoch: 0 [0/50000 (0%)] Loss: 2.533456
Train Epoch: 0 [3200/50000 (6%)] Loss: 1.910644
Train Epoch: 0 [6400/50000 (13%)] Loss: 2.203146
...
$ ls checkpoint
best_ckpt.pth