
Argo, the ship that carried Jason and the Argonauts on their quest for the Golden Fleece
This is a playground to re-implement model architectures from industry/academic papers in Pytorch. The primary goal is educational and the target audience is for those who would like to start journey in machine learning & machine learning infra. The code implementation is optimized for readability and expandability, while not for the best of performance.
- data: functions for dataset management, such as downloading public dataset, cache management, etc
- feature: functions for featuer engeering, right now primarily read data from benchmark and use Pandas to do certain feature engineer
- model: model code implementation
- trainer: simple wrapper around train/val/eval loop
- server: simple inference stack for recommendation system, including retrieval engine, feature server, model manager and inference engine
- scripts: some scripts used for setup the system, such as DB ingestion
- get-started: some userful notebooks to help you get faimilar with common techniques & concept in machine learning and recommendation system
- embedding: scripts used for generating embedding
- run
python movie_len_embedding.py
to generate the embeddings (only support the collabrative embedding) - run
python movie_len_index.py
to generate the FAISS index - run
python scripts/vector_db.py
to ingest embedding into DuckDB
- install the dependency
pip install -r requirements.txt
,pip install -e .
- run
python main.py
to train the model with current env config. - run
python -m server/ebr_server.py
to start the grpc server for embedding based retrieval, it would listen on port 50051 by default; if you use DuckDB then this step could be skipped - run
python server/inference_engine.py
to start the inference server, it would listen on 8000 port - run
bash scripts/server_request.sh
to send a dummy request (there is one for DIN and one for TransAct as of now, will parameterized the request in the future)
- ✅ Deep Interest Network for Click-Through Rate Prediction
- ✅ TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest
- Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
- Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
- ✅ DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems
- LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
- MTGR: Industrial-Scale Generative Recommendation Framework in Meituan
- TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation
Modeling
- ✅ Deep Interest Network E2E training & inference example, MovieLen Small
- ✅ TransAct training & inference example, MovieLen Large
- ✅ MovieLen item embedding generation, collaborative filtering, two-towers, LLM (QWen3-embedding is out)
- HSTU training & inference example, MoiveLen Small
- RQ-VAE
Data & Feature Engineering
- Kuaishou Dataset: https://kuairand.com/
- Ray integration (DPP reader + trainer arch)
- Daft, Polars exploartion
Infra
- ✅ Embedding Based Retrieval (EBR): DuckDB, FAISS
- Nearline item embedding update
- Feature store integration: FEAST
- Feature logging & training data generation pipeline
GPU
- GPU training & inference enablement
- Integrate profiling, benchmarking, tuning, and monitoring for accelerator optimization
- Optimize representative models with auto-tuning, kernel fusion, quantization, dynamic batching, etc