Skip to content

Latest commit

 

History

History

tutorials

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

pytorch-lifestream: Deep Learning for Event Sequence Analysis

Learn event sequence deep learning analysis with Pytorch-Lifestream. We have collected a set of topics related to the processing of event sequences. Most themes are supported by demo code using the ptls library. We recommend following the topics sequentially, though experienced users are free to dive into specific sections. Explore demos, test code, and start applying these powerful techniques to your data!


Table of Contents

  1. Prerequisites
    • Essential tools for working with deep learning and data processing.
  2. Event Sequence Analysis
    • Problem definitions and classic approaches to global and local event sequence issues.
  3. Supervised Neural Networks
    • Sequence analysis with various network types and problem-solving techniques.
  4. Unsupervised Learning
    • Self-supervised training for embeddings and representation learning.
  5. Contrastive and Non-Contrastive Learning
    • Methods for creating effective latent representations through contrastive learning.
  6. Using Pretrained Models
    • Techniques for leveraging pretrained models in downstream tasks and fine-tuning.
  7. Data Preprocessing
    • Efficient data loading and preprocessing with demos.
  8. Feature Engineering
    • Special feature types, including text encoding and multimodal sources.
  9. Transaction Encoding
    • Options for encoding and quantizing transaction data.

ix Topic Description Tutorial
1. Prerequisites
1.1. PyTorch Deep Learning framework https://www.youtube.com/watch?v=Z_ikDlimN6A
1.2. PyTorch-Lightning NN training framework https://www.youtube.com/playlist?list=PLhhyoLH6IjfyL740PTuXef4TstxAK6nGP
1.3. (optional) Hydra Configuration framework https://hydra.cc/ and [tutorials/notebooks/Hydra CoLES Training.ipynb](tutorials/notebooks/Hydra CoLES Training.ipynb)
1.4. pandas Data preprocessing https://pandas.pydata.org/
1.5. (optional) PySpark Big Data preprocessing https://spark.apache.org/
2. Event sequences Problem statement and classical methods
2.1. Event sequence for global problems e.g. event sequence classification TBD
2.2. Event sequence for local problems e.g. next event prediction TBD
3. Supervised neural networks Supervised learning for event sequence classification notebooks/supervised-sequence-to-target.ipynb
3.1. Network Types Different networks for sequences
3.1.1. Recurrent neural networks TBD based on supervised-sequence-to-target.ipynb
3.1.2. (optional) Convolutional neural networks TBD based on supervised-sequence-to-target.ipynb
3.1.3. Transformers notebooks/supervised-sequence-to-target-transformer.ipynb
3.2. Problem types Different problems types for sequences
3.2.1. Global problems Binary, multilabel, regression, ... TBD based on notebooks/multilabel-classification.ipynb
3.2.2. Local problems Next event prediction notebooks/event-sequence-local-embeddings.ipynb
4. Unsupervised learning Pretrain self-supervised model with some proxy task TBD based on notebooks/coles-emb.ipynb O4en In Colab
4.1. (optional) Word2vec Context based methods
4.2. MLM, RTD, GPT Event bases methods Self-supervided training and embeddings for clients' transactions notebook Open In Colab
4.3. NSP, SOP Sequence based methods notebooks/nsp-sop-emb.ipynb
5. Contrastive and non-contrastive learning Latent representation-based losses TBD based on notebooks/coles-emb.ipynb
5.1. CoLES notebooks/coles-emb.ipynb
5.2. VICReg TBD based on notebooks/coles-emb.ipynb
5.3. CPC TBD based on notebooks/coles-emb.ipynb
5.4. MLM, TabFormer and others Self-supervised TrxEncoder only training with Masked Language Model tutorials/notebooks/mlm-emb.ipynb notebooks/tabformer-emb.ipynb
6. Pretrained model usage
6.1. Downstream model on frozen embeddings TBD based on notebooks/coles-emb.ipynb
6.2. CatBoost embeddings features notebooks/coles-catboost.ipynb
6.3. Model finetuning notebooks/coles-finetune.ipynb
7. Preprocessing options Data preparation demos notebooks/preprocessing-demo.ipynb
7.1 ptls-format parquet data loading PySpark and Parquet for data preprocessing notebooks/pyspark-parquet.ipynb
7.2. Fast inference for big dataset notebooks/extended_inference.ipynb
8. Features special types
8.1. Using pretrained encoder to text features notebooks/coles-pretrained-embeddings.ipynb
8.2 Multi source models notebooks/CoLES-demo-multimodal-unsupervised.ipynb
9. Trx Encoding options
9.1. Basic options TBD
9.2. Transaction Quantization TBD
9.3. Transaction BPE TBD