Objective: This project contains my work on building a transformer from scratch for an German-to-English translation.
This project uses pytorch-original-transformer work to understand the inner workings of the transformer and how to build it from scratch. Along with the implementation, we are referring to the original paper to study transformers.
To understand the repo, check the HOWTO.md file.
Please follow the following steps to run the project locally
git clone https://github.com/m-np/ai-ml-project-template.git
- Open Anaconda console/Terminal and navigate into project directory
cd path_to_repo
- Run
conda create --name <env_name> python==3.9
. - Run
conda activate <env_name>
(for running scripts from your console or set the interpreter in your IDE)
For adding the new conda environment to the jupyter notebook follow this additional instruction
- Run
conda install -c anaconda ipykernel
- Run
python -m ipykernel install --user --name=<env_name>
For pytorch installation:
PyTorch pip package will come bundled with some version of CUDA/cuDNN with it, but it is highly recommended that you install a system-wide CUDA beforehand, mostly because of the GPU drivers. I also recommend using Miniconda installer to get conda on your system. Follow through points 1 and 2 of this setup and use the most up-to-date versions of Miniconda and CUDA/cuDNN for your system.
For other module installation, please follow the following steps:
- Open Anaconda console/Terminal and navigate into project directory
cd path_to_repo
- Run
conda activate <env_name>
- Run
pip install -r requirements.txt
found 👉requirements.txt
The model is trained on the Kaggle Multi30K dataset and the notebook used for training the data is found here
This model takes the following arguments as represented in the paper.
'dk': key dimensions -> 32,
'dv': value dimensions -> 32,
'h': Number of parallel attention heads -> 8,
'src_vocab_size': source vocabulary size (German) -> 8500,
'target_vocab_size': target vocabulary size (English) -> 6500,
'src_pad_idx': Source pad index -> 2,
'target_pad_idx': Target pad index -> 2,
'num_encoders': Number of encoder modules -> 3,
'num_decoders': Number of decoder modules -> 3,
'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4,
'pdropout': Dropout probability in the network -> 0.1,
'lr': learning rate used to train the model -> 0.0003,
'N_EPOCHS': Number of Epochs -> 50,
'CLIP': 1,
'patience': 5
We use Adam Optimizer along with CrossEntropyLoss to train the model.
We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8
The trained model can also be found in the huggingface repo
The following code is derived from the pytorch-original-transformer
@misc{Gordić2020PyTorchOriginalTransformer,
author = {Gordić, Aleksa},
title = {pytorch-original-transformer},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}},
}
and using the following blog