A character-level GPT transformer trained on Shakespeare, implemented from scratch in Rust (inference) and Python (training).
The training script (train/train.py) downloads the Tiny Shakespeare dataset (~1MB of text) and trains a small GPT model on it using PyTorch.
- Tokenizer: character-level — each unique character is a token (65 total)
- Model: 5-layer transformer, 4 attention heads, 64-dimensional embeddings, 256-token context window
- Size: ~1MB of weights (fits in under 1MB as a design constraint)
- Output: exports
weights/shakespeare.bin(custom binary format) andweights/vocab.json
The Rust binary loads the pre-trained weights and generates text token by token.
- Reads the
.binfile directly intof32slices (zero-copy weight loading) - Runs a full transformer forward pass: token + positional embeddings → N layers of (LayerNorm → Attention → LayerNorm → FFN) → LM head
- Samples the next character using temperature scaling + top-k filtering
- Streams output to stdout character by character
train/
train.py # PyTorch training script
input.txt # auto-downloaded Shakespeare corpus
weights/
shakespeare.bin # exported model weights (binary)
vocab.json # character vocabulary
src/
main.rs # CLI entry point + sampling logic
model.rs # transformer forward pass (attention, FFN, layer norm)
tokenizer.rs # character-level tokenizer
weights.rs # binary weight file loader
generate.sh # convenience wrapper around cargo run
cd train
pip install -r requirements.txt
python train.py# using the convenience script
./generate.sh "HAMLET:" 200 0.8
# or directly
cargo run --release -- weights/shakespeare.bin weights/vocab.json "HAMLET:" --tokens 200 --temp 0.8 --topk 40| Flag | Default | Description |
|---|---|---|
--tokens N |
200 | Number of characters to generate |
--temp F |
0.8 | Sampling temperature (higher = more random) |
--topk N |
40 | Top-k filtering (0 = disabled) |
- Inference: Rust (stable)
- Training: Python 3.8+, PyTorch, NumPy