Learn event sequence deep learning analysis with Pytorch-Lifestream. We have collected a set of topics related to the processing of event sequences. Most themes are supported by demo code using the ptls library. We recommend following the topics sequentially, though experienced users are free to dive into specific sections. Explore demos, test code, and start applying these powerful techniques to your data!
- Prerequisites
- Essential tools for working with deep learning and data processing.
- Event Sequence Analysis
- Problem definitions and classic approaches to global and local event sequence issues.
- Supervised Neural Networks
- Sequence analysis with various network types and problem-solving techniques.
- Unsupervised Learning
- Self-supervised training for embeddings and representation learning.
- Contrastive and Non-Contrastive Learning
- Methods for creating effective latent representations through contrastive learning.
- Using Pretrained Models
- Techniques for leveraging pretrained models in downstream tasks and fine-tuning.
- Data Preprocessing
- Efficient data loading and preprocessing with demos.
- Feature Engineering
- Special feature types, including text encoding and multimodal sources.
- Transaction Encoding
- Options for encoding and quantizing transaction data.
ix | Topic | Description | Tutorial | |
---|---|---|---|---|
1. | Prerequisites | |||
1.1. | PyTorch | Deep Learning framework | https://www.youtube.com/watch?v=Z_ikDlimN6A | |
1.2. | PyTorch-Lightning | NN training framework | https://www.youtube.com/playlist?list=PLhhyoLH6IjfyL740PTuXef4TstxAK6nGP | |
1.3. | (optional) Hydra | Configuration framework | https://hydra.cc/ and [tutorials/notebooks/Hydra CoLES Training.ipynb](tutorials/notebooks/Hydra CoLES Training.ipynb) | |
1.4. | pandas | Data preprocessing | https://pandas.pydata.org/ | |
1.5. | (optional) PySpark | Big Data preprocessing | https://spark.apache.org/ | |
2. | Event sequences | Problem statement and classical methods | ||
2.1. | Event sequence for global problems | e.g. event sequence classification | TBD | |
2.2. | Event sequence for local problems | e.g. next event prediction | TBD | |
3. | Supervised neural networks | Supervised learning for event sequence classification | notebooks/supervised-sequence-to-target.ipynb | |
3.1. | Network Types | Different networks for sequences | ||
3.1.1. | Recurrent neural networks | TBD based on supervised-sequence-to-target.ipynb |
||
3.1.2. | (optional) Convolutional neural networks | TBD based on supervised-sequence-to-target.ipynb |
||
3.1.3. | Transformers | notebooks/supervised-sequence-to-target-transformer.ipynb | ||
3.2. | Problem types | Different problems types for sequences | ||
3.2.1. | Global problems | Binary, multilabel, regression, ... | TBD based on notebooks/multilabel-classification.ipynb | |
3.2.2. | Local problems | Next event prediction | notebooks/event-sequence-local-embeddings.ipynb | |
4. | Unsupervised learning | Pretrain self-supervised model with some proxy task | TBD based on notebooks/coles-emb.ipynb |
|
4.1. | (optional) Word2vec | Context based methods | ||
4.2. | MLM, RTD, GPT | Event bases methods | Self-supervided training and embeddings for clients' transactions notebook |
|
4.3. | NSP, SOP | Sequence based methods | notebooks/nsp-sop-emb.ipynb | |
5. | Contrastive and non-contrastive learning | Latent representation-based losses | TBD based on notebooks/coles-emb.ipynb | |
5.1. | CoLES | notebooks/coles-emb.ipynb | ||
5.2. | VICReg | TBD based on notebooks/coles-emb.ipynb | ||
5.3. | CPC | TBD based on notebooks/coles-emb.ipynb | ||
5.4. | MLM, TabFormer and others | Self-supervised TrxEncoder only training with Masked Language Model | tutorials/notebooks/mlm-emb.ipynb notebooks/tabformer-emb.ipynb | |
6. | Pretrained model usage | |||
6.1. | Downstream model on frozen embeddings | TBD based on notebooks/coles-emb.ipynb | ||
6.2. | CatBoost embeddings features | notebooks/coles-catboost.ipynb | ||
6.3. | Model finetuning | notebooks/coles-finetune.ipynb | ||
7. | Preprocessing options | Data preparation demos | notebooks/preprocessing-demo.ipynb | |
7.1 | ptls-format parquet data loading | PySpark and Parquet for data preprocessing | notebooks/pyspark-parquet.ipynb | |
7.2. | Fast inference for big dataset | notebooks/extended_inference.ipynb | ||
8. | Features special types | |||
8.1. | Using pretrained encoder to text features | notebooks/coles-pretrained-embeddings.ipynb | ||
8.2 | Multi source models | notebooks/CoLES-demo-multimodal-unsupervised.ipynb | ||
9. | Trx Encoding options | |||
9.1. | Basic options | TBD | ||
9.2. | Transaction Quantization | TBD | ||
9.3. | Transaction BPE | TBD |