GitHub - pyemma/Argo: A ML playground for education purpose

Argo

Argo, the ship that carried Jason and the Argonauts on their quest for the Golden Fleece

This is a playground to re-implement model architectures from industry/academic papers in Pytorch. The primary goal is educational and the target audience is for those who would like to start journey in machine learning & machine learning infra. The code implementation is optimized for readability and expandability, while not for the best of performance.

Repo structure

data: functions for dataset management, such as downloading public dataset, cache management, etc
feature: functions for featuer engeering, right now primarily read data from benchmark and use Pandas to do certain feature engineer
model: model code implementation
trainer: simple wrapper around train/val/eval loop
server: simple inference stack for recommendation system, including retrieval engine, feature server, model manager and inference engine
scripts: some scripts used for setup the system, such as DB ingestion
get-started: some userful notebooks to help you get faimilar with common techniques & concept in machine learning and recommendation system
embedding: scripts used for generating embedding

Prepare Step

Embedding Based Retrieval Setup

run python movie_len_embedding.py to generate the embeddings (only support the collabrative embedding)
run python movie_len_index.py to generate the FAISS index
run python scripts/vector_db.py to ingest embedding into DuckDB

How to run locally

install the dependency pip install -r requirements.txt, pip install -e .
run python main.py to train the model with current env config.
run python -m server/ebr_server.py to start the grpc server for embedding based retrieval, it would listen on port 50051 by default; if you use DuckDB then this step could be skipped
run python server/inference_engine.py to start the inference server, it would listen on 8000 port
run bash scripts/server_request.sh to send a dummy request (there is one for DIN and one for TransAct as of now, will parameterized the request in the future)

Papers

Road Map

Modeling

✅ Deep Interest Network E2E training & inference example, MovieLen Small
✅ TransAct training & inference example, MovieLen Large
✅ MovieLen item embedding generation, collaborative filtering, two-towers, LLM (QWen3-embedding is out)
HSTU training & inference example, MoiveLen Small
RQ-VAE

Data & Feature Engineering

Kuaishou Dataset: https://kuairand.com/
Ray integration (DPP reader + trainer arch)
Daft, Polars exploartion

Infra

✅ Embedding Based Retrieval (EBR): DuckDB, FAISS
Nearline item embedding update
Feature store integration: FEAST
Feature logging & training data generation pipeline

GPU

GPU training & inference enablement
Integrate profiling, benchmarking, tuning, and monitoring for accelerator optimization
Optimize representative models with auto-tuning, kernel fusion, quantization, dynamic batching, etc

Reference

DuckDB
QWen3

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
configs		configs
data		data
embedding		embedding
feature		feature
get-started		get-started
model		model
proto		proto
scripts		scripts
server		server
topics		topics
trainer		trainer
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
main.py		main.py
main_transact.py		main_transact.py
movie_len_embeddings.py		movie_len_embeddings.py
movie_len_index.py		movie_len_index.py
movie_len_llm_embedding.py		movie_len_llm_embedding.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Argo

Repo structure

Prepare Step

Embedding Based Retrieval Setup

How to run locally

Papers

Road Map

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

pyemma/Argo

Folders and files

Latest commit

History

Repository files navigation

Argo

Repo structure

Prepare Step

Embedding Based Retrieval Setup

How to run locally

Papers

Road Map

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages