Paper Replication: Vision Transformer (ViT)

Atyab Hakeem

This project replicates the results of the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". The goal is to validate the performance of Vision Transformer (ViT) on image classification tasks using CIFAR-10.

Background

Architecture Details - ViT-Base

Layers: 12
Hidden size: 768
MLP size: 3072
Heads: 12
Params: 86M

Training Details

Train Dataset: JFT-300M
Optimizer: Adam
- Beta1: 0.9
- Beta2: 0.999
- Weight decay: 0.1
LR Scheduler: Linear warmup and decay
Batch size: 4096
Dropout: Yes

Fine-tuning Details (Higher Resolution)

Fine-tune Dataset: CIFAR-10
Optimizer: SGD + Momentum
Batch size: 512
Callbacks: Early stopping

Results

Accuracy on CIFAR-10: 99.50 ± 0.06% (Pretrained on JFT-300M)

Proposed Benefits

Outperforms CNNs with less compute
Hybrids (CNN inputs + Transformer) work well for smaller ViT variants but not for larger ones

My Replication

These are the details of my implementation:

Framework: PyTorch
Optimizer: AdamW
Learning Rate: 3e-4
Weight Decay: 1e-3
Batch Size: 1024
Gradient Norm Clipping: 10.0
Scheduler: Cosine Annealing LR
Epochs: 142
Number of Parameters: 8.45M
VCallbacks: Early stopping, MLFlow experiment tracking. Training was performed on a CUDA-enabled GPU. The model architecture and hyperparameters were chosen based on the original paper and optimized to match the training setup as closely as possible. The training script used gradient clipping and a learning rate scheduler to stabilize the training process.

Results

Validation Accuracy on CIFAR-10: 70% (No pretraining)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
PaperReplicationViT.ipynb		PaperReplicationViT.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Replication: Vision Transformer (ViT)

Atyab Hakeem

Background

Results

My Replication

Results

Socials

Atyab Hakeem

About

Releases

Packages

Languages

hakeematyab/Paper-Replication-Vision-Transformer-ViT

Folders and files

Latest commit

History

Repository files navigation

Paper Replication: Vision Transformer (ViT)

Atyab Hakeem

Background

Results

My Replication

Results

Socials

Atyab Hakeem

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages