A list of papers that I believe are either important or useful for understanding deep learning.
The most landmark and important papers in deep learning. Read these papers in order.
Paper | Description |
---|---|
Backpropagation | The original paper that described the backpropagation algorithm, the central algorithm behind how modern deep learning models work. |
AlexNet | Often considered the paper that kickstarted the modern era of deep learning, this paper proposed the idea of just stacking a bunch of layers as a performance improvement method |
The Adam Optimizer | A key landmark in optimization algorithms for deep learning models, this paper proposes a new framework for weight updates during backpropagation. |
Long Short-Term Memory | A huge breakthrough in sequential understanding for deep learning models, allowing them to store and tune the information they saw previously and use it for future predictions. |
Attention is All You Need | Arugably one of the two most important papers in modern deep learning, along with AlexNet, this paper proposed the Transformer, the building block to large language models, and a huge milestone in language understanding for deep learning models. |
Deep Reinforcement Learning | A key breakthrough in reinforcement learning, this paper combined modern efforts in deep learning with goal-based learning approaches, instead of loss-based approaches. |
Denoising Diffusion Models | This paper proposed an architecture and algorithm for image generation that produced highly life-like images, a key landmark in artificial image understanding. |
Language Models are Few-Shot Learners | This is the paper that was released alongside the original ChatGPT, explaining how very large language models could demonstrate viable performance in tasks they had limited knowledge in. |
Important works in computer vision.
Paper |
---|
Residual Networks |
Papers about pre-transformer NLP breakthroughs.
Paper |
---|
word2vec |
Nucleus Sampling |
Papers about transformers and their applications.
Paper |
---|
BERT |
Vision Transformers |
Papers about large language models and related works on them.
Paper |
---|
Chain of Thought Reasoning |
Instruction Tuning |
Speculative Decoding |
Historically famous papers that are still used today, but not essential reading.
Paper |
---|
ReLU |
UNet |
XGBoost |
Batch Normalization |
Papers about deep generative models.
Paper |
---|
Variational Autoencoders |
GANs |
Papers about reinforcement learning
Paper |
---|
Proximal Policy Optimization |
Papers about research about deep learning.
Paper |
---|
Contrastive Representation Learning (CLIP) |
The Lottery Ticket Hypothesis |
Papers using deep learning to solve huge problems.
Paper |
---|
AlphaFold |
Papers exploring the math of deep learning advancements.
Paper |
---|
Dropout |
Low Rank Adaptation |
GANs as a Nash Equilibrium |