A minimal deep learning framework built from scratch
"Immi" (Tamil: เฎเฎฎเฏเฎฎเฎฟ) โ the smallest primitive measure (1/2,150,400)
Big Picture โข Tiers โข Milestones โข Learning Paths
I'm building a stripped-down, primitive implementation of PyTorch from scratch for educational purposes. No magic, no black boxesโjust pure understanding of how deep learning frameworks actually work.
Following the TinyTorch curriculum from the ML Systems Book by Prof. Vijay Janapa Reddi (Harvard University).
"What I cannot create, I do not understand." โ Richard Feynman
20 modules. Three tiers. One complete ML system.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OPTIMIZATION (14-19) ๐ โ
โ โโโโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโ โโโโโโโโ โโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ Profilingโ โQuant โ โ Compress โ โ Memo โ โ Accel โ โ Benchmark โ โ
โ โโโโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโ โโโโโโโโ โโโโโโโโโ โโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ARCHITECTURE (09-13) ๐ฃ โ
โ โโโโโโโโโโโโโโ โโโโโโโโ โ
โ โ DataLoader โ โ CNNs โ โ Vision Track โ
โ โโโโโโโโโโโโโโ โโโโโโโโ โ
โ โ โโโโโโโโโโโโโ โโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโ Tokentic โ โ Embed โ โ Attention โ โ Transformer โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Language Track โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ FOUNDATION (01-08) ๐ต โ
โ โโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โ โ Tensor โโโ Activations โโโ Layers โโโ Losses โ โ
โ โโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Autograd โโโ Optimizers โโโ Training โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Build the core machinery
| # | Module | What it does | Status |
|---|---|---|---|
| 01 | Tensor | Data structure - holds all your numbers | โ pushed jan5 |
| 02 | Activations | Non-linearity - ReLU, Sigmoid, Tanh | โ pushed jan29 |
| 03 | Layers | Parameterized transformations | โณ |
| 04 | Losses | Measure prediction error | โณ |
| 05 | DataLoader | Efficient data batching | โณ |
| 06 | Autograd | Automatic gradient computation | โณ |
| 07 | Optimizers | SGD, Adam, RMSprop | โณ |
| 08 | Training | Complete training loop | โณ |
Apply foundation to real problems
| # | Module | What it does | Track |
|---|---|---|---|
| 09 | DataLoader+ | Advanced data pipelines | Both |
| 10 | CNNs | Convolutions for images | ๐๏ธ Vision |
| 11 | Tokenization | Text โ tokens | ๐ Language |
| 12 | Embeddings | Tokens โ vectors | ๐ Language |
| 13 | Attention | Self-attention mechanism | ๐ Language |
| 14 | Transformers | GPT architecture | ๐ Language |
Make it production-ready
| # | Module | What it does |
|---|---|---|
| 15 | Profiling | Find bottlenecks |
| 16 | Quantization | Reduce precision |
| 17 | Compression | Smaller models |
| 18 | Memoization | Cache computations |
| 19 | Acceleration | Hardware optimization |
| 20 | Benchmarking | MLPerf-style metrics |
Historical achievements I'll unlock by recreating 70 years of ML evolution:
| Milestone | Year | Achievement | Modules Required |
|---|---|---|---|
| ๐ง Perceptron | 1957 | First learning algorithm (Rosenblatt) | 01-04 |
| โก XOR | 1969 | MLP solves non-linear problems | 01-08 |
| โ๏ธ MLP | 1986 | Handwritten digit recognition | 01-08 |
| ๐๏ธ CNN | 1998 | LeNet-5 image classification | 01-09 |
| ๐ค Transformer | 2017 | "Attention Is All You Need" | 01-13 |
| ๐ MLPerf | 2018 | Production-speed benchmarks | 01-19 |
| Modules | Outcome | Historical Context |
|---|---|---|
| 01-04 | Working Perceptron classifier | Rosenblatt 1957 |
| 01-08 | MLP solving XOR + complete training pipeline | AI Winter breakthrough 1969โ1986 |
| 01-09 | CNN with convolutions and pooling | LeNet-5 (1998) |
| 01-13 | GPT model with autoregressive generation | "Attention Is All You Need" (2017) |
| 01-19 | Optimized, quantized, accelerated system | Production ML today |
| 01-20 | MLPerf-style benchmarking submission | Torch Olympics |
Immi-Torch/
โโโ immi_torch/
โ โโโ __init__.py # Main package exports
โ โ
โ โโโ tier1_foundation/ # ๐ต Core ML machinery (01-08)
โ โ โโโ tensor.py # 01: Multidimensional arrays
โ โ โโโ activations.py # 02: ReLU, Sigmoid, Tanh
โ โ โโโ layers.py # 03: Linear, Module base
โ โ โโโ losses.py # 04: MSE, CrossEntropy
โ โ โโโ data.py # 05: DataLoader, Dataset
โ โ โโโ autograd.py # 06: Automatic differentiation
โ โ โโโ optim.py # 07: SGD, Adam, RMSprop
โ โ โโโ train.py # 08: Training loop
โ โ
โ โโโ tier2_architecture/ # ๐ฃ Vision & Language (09-14)
โ โ โโโ cnn.py # 10: Conv2d, Pooling
โ โ โโโ tokenizer.py # 11: Text tokenization
โ โ โโโ embeddings.py # 12: Token embeddings
โ โ โโโ attention.py # 13: Self-attention
โ โ โโโ transformer.py # 14: GPT architecture
โ โ
โ โโโ tier3_optimization/ # ๐ Production-ready (15-20)
โ โโโ profiling.py # 15: Find bottlenecks
โ โโโ quantization.py # 16: Reduce precision
โ โโโ compression.py # 17: Pruning, distillation
โ โโโ memoization.py # 18: Cache computations
โ โโโ acceleration.py # 19: JIT, op fusion
โ โโโ benchmarking.py # 20: MLPerf metrics
โ
โโโ tests/ # Test suite
โโโ milestones/ # Historical achievements (70 years of ML)
โ โโโ 01_perceptron.py # ๐ง 1957 - First neural network
โ โโโ 02_xor.py # โก 1969 - Non-linear learning
โ โโโ 03_mnist_mlp.py # โ๏ธ 1986 - Handwritten digits
โ โโโ 04_cnn_lenet.py # ๐๏ธ 1998 - LeNet-5 vision
โ โโโ 05_transformer.py # ๐ค 2017 - Attention mechanism
โ โโโ 06_mlperf.py # ๐ 2018 - Production benchmarks
โโโ examples/ # Usage examples
โโโ docs/ # Documentation
โโโ tier1_plans.md # Detailed Tier 1 roadmap
# Clone the repository
git clone https://github.com/ashwin-r11/Immi-Torch.git
cd Immi-Torch
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/from immi_torch import Tensor, Linear, ReLU, MSELoss, SGD
# Create a simple model
model = Linear(10, 1)
loss_fn = MSELoss()
optimizer = SGD(model.parameters(), lr=0.01)
# Training step
x = Tensor.randn(32, 10)
y = Tensor.randn(32, 1)
pred = ReLU()(model(x))
loss = loss_fn(pred, y)
loss.backward()
optimizer.step()Getting stuck is not a bugโit's a feature.
TinyTorch uses productive struggle as a teaching tool. The frustration you feel is your brain rewiring to understand ML systems at a deeper level.
When stuck:
- Run tests early and often
- Explain the problem to a rubber duck
- Ask for help after 30+ minutes on a single bug
- Curriculum: TinyTorch - ML Systems Book
- Theory: Deep Learning Book by Goodfellow, Bengio & Courville
- Big Picture: Module Overview
- Getting Started: Quick Start Guide
- Reference: PyTorch Documentation
MIT License - see LICENSE for details.
The North Star ๐
By module 13, I'll have a complete GPT model generating textโbuilt from raw Python.
By module 20, I'll benchmark my entire framework with MLPerf-style submissions.
Every tensor operation. Every gradient calculation. Every optimization trick.
I wrote it.
