Skip to content
/ NL2EN Public

Learn LLM fundamentals by building a small (~90M parameter) Transformer from scratch that translates Dutch to English.

Notifications You must be signed in to change notification settings

braunagn/NL2EN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8c71319 · Jan 13, 2024

History

24 Commits
Jan 13, 2024
Jan 4, 2024
Dec 15, 2023
Dec 15, 2023
Jan 13, 2024
Jan 3, 2024
Dec 15, 2023
Jan 3, 2024
Jan 13, 2024
Dec 6, 2023
Dec 4, 2023
Dec 7, 2023
Jan 13, 2024
Jan 13, 2024

Repository files navigation

Encoder/Decoder Tutorial

Learn the fundamentals of the Encoder/Decoder Transformer architecture (the building block of LLMs like ChatGPT) with a working pytorch example that translates Dutch (NL) to English (EN). This example closely follows the transformer architecture of the Attention is All You Need paper.

  • Get started with the interactive Google Colab notebook here.

architecture ^ model architecture here

Training Data

  • Credit goes to Tatoeba.org for the Dutch <-> English sentence pairs.

Model Architecture

  • C = 512 (aka d_model)
  • T = 30 (max context length; informed by sentence length)
  • number of layers = 6
  • number of heads = 8
  • head size = 64

Training Params

  • Trained via Google Colab (V100 Machine)
  • Epochs: 20
  • batch size: 8
  • One cycle learning schedule (init=1e-7; max=1e-5, final=1e-6)
  • Warmup steps: 5000
  • dropout: 10%

Limitations

The primary purpose of this repo is educational and as such has the following limitations:

  • The model itself it trained on a very small dataset (~140K sentence pairs) whereas modern LLMs are trained on +trillion tokens. The performance of the model reflects this.
  • Training data sentence pairs are fairly short in length (mean of ~30 characters each with a long right-skewed tail) which likely limits the model's ability to translate long sentences.
  • Training epochs were limited to 20 but additional training could be performed (see model_object to continue training on your own).
    • main.py can be used for remote host training (via services such as Lambda Labs, Paperspace) to avoid common timeouts errors with Google Colab. Here are the steps I followed to train the model on a Windows 10 remote host (via Paperspace).

Python Dependencies

  • Conda used to build environment (see environment.yml for dependencies)
  • note: to enable GPU usage, pytorch+cu118 was installed using pip install torch==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118 and not through the typical conda install process.

About

Learn LLM fundamentals by building a small (~90M parameter) Transformer from scratch that translates Dutch to English.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages