LLMs From Scratch

This repository is dedicated to building and understanding Large Language Models (LLMs) from the ground up using PyTorch. It provides a clear and concise implementation of popular transformer-based architectures like BERT, GPT, and ALBERT. This project is intended for educational purposes, to help developers and researchers understand the inner workings of these complex models.

Features

Pure PyTorch Implementation: The models are built using only PyTorch, making the code easy to follow and modify.
Classic Architectures: Includes implementations of several well-known transformer architectures.
Modular Design: The code is organized into modules for easy understanding and extension.
Training and Inference Scripts: Comes with scripts to train your own models and to use them for predictions.
Minimal Dependencies: The project has minimal dependencies, making it easy to set up and run.

Architecture Overview

Transformer
├── Encoder (Stack of 6 EncoderBlocks)
│   ├── Multi-Head Self-Attention
│   ├── Feed-Forward Network  
│   ├── Residual Connections
│   └── Layer Normalization
│
├── Decoder (Stack of 6 DecoderBlocks)  
│   ├── Masked Multi-Head Self-Attention
│   ├── Multi-Head Cross-Attention
│   ├── Feed-Forward Network
│   ├── Residual Connections
│   └── Layer Normalization
│
├── Input Embedding + Positional Encoding
└── Output Linear Layer

Project Structure

The repository is organized as follows:

llms-from-scratch/
├── archs/
│   ├── __init__.py
│   ├── albert.py
│   ├── bert.py
│   ├── gpt.py
│   ├── roberta.py
│   └── transformer.py
├── src/
│   ├── __init__.py
│   ├── eval.py
│   ├── predict.py
│   └── train.py
├── tests/
│   ├── __init__.py
│   └── test_transformer.py
├── experiments.ipynb
├── README.md
└── requirements.txt

archs/: Contains the implementation of different transformer architectures.
src/: Includes scripts for training (train.py), evaluation (eval.py), and prediction (predict.py).
tests/: Contains unit tests for the implemented models.
experiments.ipynb: A Jupyter notebook for experimenting with the models.
requirements.txt: A list of dependencies for the project.

Getting Started

Prerequisites

Python 3.7+
PyTorch

Installation

Clone the repository:

git clone https://github.com/thedatamonk/llms-from-scratch.git
cd llms-from-scratch

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate

Install the dependencies:
```
pip install -r requirements.txt
```
Install this repo as a package
```
pip install -e .
```

Usage

Training

To train a model, you can use the train.py script. You will need to provide a dataset and specify the model architecture you want to use.

python src/train.py --model <model_name> --dataset <path_to_dataset>

Evaluation

To evaluate a trained model, use the eval.py script.

python src/eval.py --model_path <path_to_model> --dataset <path_to_dataset>

Prediction

To make predictions with a trained model, use the predict.py script.

python src/predict.py --model_path <path_to_model> --text "Your input text here"

Implemented Architectures

This project includes the following transformer-based architectures:

Transformer (Base): The original transformer model from "Attention Is All You Need".
BERT: Bidirectional Encoder Representations from Transformers.
GPT: Generative Pre-trained Transformer.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.
RoBERTa: A Robustly Optimized BERT Pretraining Approach.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
archs		archs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMs From Scratch

Features

Architecture Overview

Project Structure

Getting Started

Prerequisites

Installation

Usage

Training

Evaluation

Prediction

Implemented Architectures

About

Uh oh!

Releases

Packages

Languages

thedatamonk/llms-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLMs From Scratch

Features

Architecture Overview

Project Structure

Getting Started

Prerequisites

Installation

Usage

Training

Evaluation

Prediction

Implemented Architectures

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages