Skip to content

Vic-GoodLuck/GRID

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generative Recommendation with Semantic IDs (GRID)

PyTorch Hydra Lightning arXiv

GRID (Generative Recommendation with Semantic IDs) is a state-of-the-art framework for generative recommendation systems using semantic IDs, developed by a group of scientists and engineers from Snap Research. This project implements novel approaches for learning semantic IDs from text embedding and generating recommendations through transformer-based generative models.

🚀 Overview

GRID facilitates generative recommendation three overarching steps:

  • Embedding Generation with LLMs: Converting item text into embeddings using any LLMs available on Huggingface.
  • Semantic ID Learning: Converting item embedding into hierarchical semantic IDs using Residual Quantization techniques such as RQ-KMeans, RQ-VAE, RVQ.
  • Generative Recommendations: Using transformer architectures to generate recommendation sequences as semantic ID tokens.

📦 Installation

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended)

Setup Environment

# Clone the repository
git clone https://github.com/snap-research/GRID.git
cd GRID

# Install dependencies
pip install -r requirements.txt

🎯 Quick Start

1. Data Preparation

Prepare your dataset in the expected format:

data/
├── train/       # training sequence of user history 
├── validation/  # validation sequence of user history 
├── test/        # testing sequence of user history 
└── items/       # text of all items in the dataset

We provide pre-processed Amazon data explored in the P5 paper [4]. The data can be downloaded from this google drive link.

2. Embedding Generation with LLMs

Generate embeddings from LLMs, which later will be transformed into semantic IDs.

python -m src.inference experiment=sem_embeds_inference_flat data_dir=data/amazon_data/beauty # avaiable data includes 'beauty', 'sports', and 'toys'

3. Train and Generate Semantic IDs

Learn semantic ID centroids for embeddings generated in step 2:

python -m src.train experiment=rkmeans_train_flat \
    data_dir=data/amazon_data/beauty \
    embedding_path=<output_path_from_step_2>/merged_predictions_tensor.pt \ # this can be found in the log dirs in step2
    embedding_dim=2048 \ # the model dimension of the LLMs you use in step 2. 2048 for flan-t5-xl as used in this example.
    num_hierarchies=3 \  # we train 3 codebooks
    codebook_width=256 \ # each codebook has 256 rows of centroids  

Generate SIDs:

python -m src.inference experiment=rkmeans_inference_flat \
    data_dir=data/amazon_data/beauty \
    embedding_path=<output_path_from_step_2>/merged_predictions_tensor.pt \ 
    embedding_dim=2048 \ 
    num_hierarchies=3 \  
    codebook_width=256 \ 
    ckpt_path=<the_checkpoint_you_just_get_above> # this can be found in the log dir for training SIDs

4. Train Generative Recommendation Model with Semantic IDs

Train the recommendation model using the learned semantic IDs:

python -m src.train experiment=tiger_train_flat \
    data_dir=data/amazon_data/beauty \ 
    semantic_id_path=<output_path_from_step_3>/pickle/merged_predictions_tensor.pt \
    num_hierarchies=4 # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.

4. Generate Recommendations

Run inference to generate recommendations:

python -m src.inference experiment=tiger_inference_flat \
    data_dir=data/amazon_data/beauty \ 
    semantic_id_path=<output_path_from_step_3>/pickle/merged_predictions_tensor.pt \
    ckpt_path=<the_checkpoint_you_just_get_above> \ # this can be found in the log dir for training GR models
    num_hierarchies=4 \ # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.

Supported Models:

Semantic ID:

  1. Residual K-means proposed in One-Rec [2]
  2. Residual Vector Quantization
  3. Residual Quantization with Variational Autoencoder [3]

Generative Recommendation:

  1. TIGER [1]

📚 Citation

If you use GRID in your research, please cite:

@inproceedings{grid,
  title     = {Generative Recommendation with Semantic IDs: A Practitioner's Handbook},
  author    = {Ju, Clark Mingxuan and Collins, Liam and Neves, Leonardo and Kumar, Bhuvesh and Wang, Louis Yufeng and Zhao, Tong and Shah, Neil},
  booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM)},
  year      = {2025}
}

🤝 Acknowledgments

📞 Contact

For questions and support:

Bibliography

[1] Rajput, Shashank, et al. "Recommender systems with generative retrieval." Advances in Neural Information Processing Systems 36 (2023): 10299-10315.

[2] Deng, Jiaxin, et al. "Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment." arXiv preprint arXiv:2502.18965 (2025).

[3] Lee, Doyup, et al. "Autoregressive image generation using residual quantization." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[4] Geng, Shijie, et al. "Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5)." Proceedings of the 16th ACM conference on recommender systems. 2022.

About

GRID: Generative Recommendation with Semantic IDs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%