Name	Name	Last commit message	Last commit date
Latest commit braunagn enable multi-GPU training (DDP) Jan 13, 2024 8c71319 · Jan 13, 2024 History 24 Commits
mgpu	mgpu	enable multi-GPU training (DDP)	Jan 13, 2024
.gitignore	.gitignore	enable multiple GPU training	Jan 4, 2024
README.md	README.md	enable training of loaded model object	Dec 15, 2023
architecture.svg	architecture.svg	enable training of loaded model object	Dec 15, 2023
config.py	config.py	enable multi-GPU training (DDP)	Jan 13, 2024
dataset.py	dataset.py	add utilities.py for DRY	Jan 3, 2024
environment.yml	environment.yml	add environment.yml	Dec 15, 2023
main.py	main.py	add utilities.py for DRY	Jan 3, 2024
model.py	model.py	enable multi-GPU training (DDP)	Jan 13, 2024
sentence_prep.py	sentence_prep.py	minor cleanup	Dec 6, 2023
sentences.tsv	sentences.tsv	Update sentences.tsv	Dec 4, 2023
tokenizer_prep.py	tokenizer_prep.py	added annotations throughout and bug fixes	Dec 7, 2023
train.py	train.py	enable multi-GPU training (DDP)	Jan 13, 2024
utilities.py	utilities.py	enable multi-GPU training (DDP)	Jan 13, 2024

Repository files navigation

Encoder/Decoder Tutorial

Learn the fundamentals of the Encoder/Decoder Transformer architecture (the building block of LLMs like ChatGPT) with a working pytorch example that translates Dutch (NL) to English (EN). This example closely follows the transformer architecture of the Attention is All You Need paper.

Get started with the interactive Google Colab notebook here.

^ model architecture here

Training Data

Credit goes to Tatoeba.org for the Dutch <-> English sentence pairs.

Model Architecture

C = 512 (aka d_model)
T = 30 (max context length; informed by sentence length)
number of layers = 6
number of heads = 8
head size = 64

Training Params

Trained via Google Colab (V100 Machine)
Epochs: 20
batch size: 8
One cycle learning schedule (init=1e-7; max=1e-5, final=1e-6)
Warmup steps: 5000
dropout: 10%

Limitations

The primary purpose of this repo is educational and as such has the following limitations:

The model itself it trained on a very small dataset (~140K sentence pairs) whereas modern LLMs are trained on +trillion tokens. The performance of the model reflects this.
Training data sentence pairs are fairly short in length (mean of ~30 characters each with a long right-skewed tail) which likely limits the model's ability to translate long sentences.
Training epochs were limited to 20 but additional training could be performed (see model_object to continue training on your own).
- main.py can be used for remote host training (via services such as Lambda Labs, Paperspace) to avoid common timeouts errors with Google Colab. Here are the steps I followed to train the model on a Windows 10 remote host (via Paperspace).

Python Dependencies

Conda used to build environment (see environment.yml for dependencies)
note: to enable GPU usage, pytorch+cu118 was installed using pip install torch==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118 and not through the typical conda install process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Encoder/Decoder Tutorial

Training Data

Model Architecture

Training Params

Limitations

Python Dependencies

About

Releases

Packages

Languages

braunagn/NL2EN

Folders and files

Latest commit

History

Repository files navigation

Encoder/Decoder Tutorial

Training Data

Model Architecture

Training Params

Limitations

Python Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages