Skip to content

Theoretical introduction for language processing terminologies (such as, embedding, encoder/decoder, attention, transformer, ...) and comprehensive examples of Python code from scratch (Sep 2022)

Notifications You must be signed in to change notification settings

chicolinux/nlp-tutorials

 
 

Repository files navigation

Natural Language Processing (Neural Methods) Tutorials

This repository consists of comprehensive examples to learn fundamental language processing (NLP) from the beginning.
Each notebook has end-to-end implementation (for each task) from scratch in Python (PyTorch), and also describes fundamental ideas and background for each architecture.

  1. Tokenization and Primitive Embeddings (Sparse Vector)
  2. Tokenization and Custom Embedding (Dense Vector)
  3. Word2Vec algorithm (Negative Sampling)
  4. N-Gram detection with 1D Convolution
  5. Language Model - Basic FFN
  6. Language Model - RNN (Recurrent Neural Network)
  7. Encoder-Decoder (Seq2Seq)
  8. Attention
  9. Transformer

I recommend you to run these examples on GPU-utilized machine.

Tutorials follow the history of NLP neural methods.
In the latter part (from tutorial 5), I then focus on language models, improving the models by step-by-step approaches, and reach to learn how and why the widely used Transformer architecture matters. (You will find how it's developed and improved by running actual tasks.)

NLP (natural language processing) has a long history in artificial intelligence, and generative models were also developed with traditional statistical models in 1950s - such as, applying Hidden Markov Models (HMMs) or Gaussian Mixture Models (GMMs).
This repository, however, focuses on recent neural methods engaged in today's NLP.

[Feb 2023] All examples were transformed (from TensorFlow) into PyTorch.
[Feb 2025] Removed torchtext dependency. (Because it's deprecated.)

Tsuyoshi Matsuzaki @ Microsoft

About

Theoretical introduction for language processing terminologies (such as, embedding, encoder/decoder, attention, transformer, ...) and comprehensive examples of Python code from scratch (Sep 2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%