Skip to content

TusKANNy/awesome-multivector-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Multivector Retrieval

Awesome

An extensive and commented list of resources on late-interaction multivector retrieval.

Contents

Models

Foundational Models

  • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
    Omar Khattab, Matei Zaharia
    SIGIR, 2020
    πŸ“„ paper | πŸ› οΈ code

  • COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List
    Luyu Gao, Zhuyun Dai, Jamie Callan
    NAACL, 2021
    πŸ“„ paper

  • ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
    Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia
    NAACL, 2022
    πŸ“„ paper | πŸ› οΈ code

  • Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
    Rajesh Jayaram
    arXiv, 2026
    πŸ“„ paper

General Models & Training

  • PyLate: Flexible Training and Retrieval for Late Interaction Models
    Antoine Chaffin, RaphaΓ«l Sourty
    CIKM, 2025
    πŸ“„ paper | πŸ› οΈ code

  • ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models
    Antoine Chaffin, Luca Arnaboldi, AmΓ©lie Chatelain, Florent Krzakala
    arXiv, 2026
    πŸ“„ paper

  • A Replicability Study of XTR
    Rohan Jha, Reno Kriz, Benjamin Van Durme
    arXiv, 2026
    πŸ“„ paper

  • Your Embedding Model is SMARTer Than You Think
    Jianrui Zhang, Hyun Jung Lee, Sukanta Ganguly, Tae-Eui Kam, Donghyun Kim, Yong Jae Lee
    arXiv, 2026
    πŸ“„ paper | πŸ› οΈ code

  • Party is over: regularizing ColBERT models to fix efficient ANN methods
    LightOn AI
    Blog, 2026
    πŸ“ blog

Compression & Token Pruning

  • Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
    Sebastian Hofstatter, Omar Khattab, Sophia Althammer, Mete Sertkan, Allan Hanbury
    CIKM, 2022
    πŸ“„ paper | πŸ› οΈ code

  • Joint Optimization of Multi-Vector Representation with Product Quantization
    Yufan Fang, Jing Zhan, Yiqun Liu, Jiafeng Mao, Min Zhang, Shaoping Ma
    NLPCC, 2022
    πŸ“„ paper

  • CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
    Minghan Li, Sean C. Lin, Barlas Oguz, Arnab Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
    ACL, 2023
    πŸ“„ paper

  • SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
    Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin
    SIGIR, 2023
    πŸ“„ paper

  • Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
    Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao
    NeurIPS, 2023
    πŸ“„ paper

  • SPLATE: Sparse Late Interaction Retrieval
    Thibault Formal, Stephane Clinchant, Herve Dejean, Carlos Lassance
    SIGIR, 2024
    πŸ“„ paper

  • Muvera: Multi-Vector Retrieval via Fixed Dimensional Encodings
    Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, Vahab Mirrokni
    NeurIPS, 2024
    πŸ“„ paper

  • Enhancing ColBERT: A Method for Reducing Space Complexity and Accelerating Retrieval Speed
    Hai Nguyen T., Huong Le T.
    PACLIC, 2024
    πŸ“„ paper

  • Token Pruning Optimization for Efficient Multi-vector Dense Retrieval
    Shanxiu He, Mutasem Al-Darabsah, Suraj Nair, Jonathan May, Tarun Agarwal, Tao Yang, Choon Hui Teo
    ECIR, 2025
    πŸ“„ paper

  • CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
    João Veneroso, Rajesh Jayaram, Jinmeng Rao, Gustavo HernÑndez Ábrego, Majid Hadian, Daniel Cer
    arXiv, 2025
    πŸ“„ paper

  • Towards Lossless Token Pruning in Late-Interaction Retrieval Models
    Yuxuan Zong, Benjamin Piwowarski
    SIGIR, 2025
    πŸ“„ paper

  • Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
    Yibo Yan, Mingdong Ou, Yi Cao, Xin Zou, Jiahao Huo, Shuliang Liu, James Kwok, Xuming Hu
    arXiv, 2026
    πŸ“„ paper

  • Multi-Vector Index Compression in Any Modality
    Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
    arXiv, 2026
    πŸ“„ paper | πŸ› οΈ code

  • A Brief Comparison of Training-Free Multi-Vector Sequence Compression Methods
    Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
    ECIR (LIR Workshop), 2026
    πŸ“„ paper

Multimodal & Vision

  • ColPali: Efficient Document Retrieval with Vision Language Models
    Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Celine Hudelot, Pierre Colombo
    ICLR, 2025
    πŸ“„ paper

  • Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
    Arun V. Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa
    CVPR, 2025
    πŸ“„ paper

  • ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
    Ahmed Masry, Megh Thakkar, Patrice Bechard, Sathwik Tejaswi Madhusudhan, Rabiul Awal, Shambhavi Mishra, Akshay Kalkunte Suresh, Srivatsava Daruru, Enamul Hoque, Spandana Gella, Torsten Scholak, Sai Rajeswar
    EMNLP, 2025
    πŸ“„ paper

Retrieval

Indexing & Search Algorithms

  • Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
    Omar Khattab, Christopher Potts, Matei Zaharia
    NeurIPS, 2021
    πŸ“„ paper

  • PLAID: An Efficient Engine for Late Interaction Retrieval
    Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia
    CIKM, 2022
    πŸ“„ paper | πŸ› οΈ code

  • DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
    Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava
    NeurIPS, 2023
    πŸ“„ paper

  • Efficient Multi-Vector Dense Retrieval with Bit Vectors
    Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
    ECIR, 2024
    πŸ“„ paper | πŸ› οΈ code

  • A Reproducibility Study of PLAID
    Sean MacAvaney, Nicola Tonellotto
    SIGIR, 2024
    πŸ“„ paper

  • Efficient Constant-Space Multi-vector Retrieval
    Sean MacAvaney, Antonio Mallia, Nicola Tonellotto
    ECIR, 2025
    πŸ“„ paper

  • IGP: Efficient Multi-Vector Retrieval via Proximity Graph Index
    Ziyang Bian, Man Lung Yiu, Buzhou Tang
    SIGIR, 2025
    πŸ“„ paper | πŸ› οΈ code

  • WARP: An Efficient Engine for Multi-Vector Retrieval
    Joel L. Scheerer, Matei Zaharia, Christopher Potts, Gustavo Alonso, Omar Khattab
    SIGIR, 2025
    πŸ“„ paper | πŸ› οΈ code

  • Multivector Reranking in the Era of Strong First-Stage Retrievers
    Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
    ECIR, 2026
    πŸ“„ paper | πŸ› οΈ code | πŸ› οΈ code

  • SMVE: Sparse Multi-Vector Retrieval
    Martin Spisak, Marek Galovic
    ECIR, 2026
    πŸ“„ blog

  • No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval
    Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You
    ICML, 2026
    πŸ“„ paper

  • LEMUR: Learned Multi-Vector Retrieval
    Elias JÀÀsaari, Ville Hyvânen, Teemu Roos
    ICML, 2026
    πŸ“„ paper | πŸ› οΈ code

  • Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
    Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
    SIGIR, 2026
    πŸ“„ paper | πŸ› οΈ code

  • ColBERTSaR: Sparsified ColBERT Index via Product Quantization
    Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield, Saron Samuel, Rohan Jha
    SIGIR, 2026
    πŸ“„ paper | πŸ› οΈ code

Scoring Kernels

  • FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Scoring
    Roi Pony, Adi Raz Goldfarb, Idan Friedman, Daniel Ezer, Udi Barzelay
    arXiv, 2026
    πŸ“„ paper | πŸ› οΈ code

  • TileMaxSim: IO-Aware GPU MaxSim Scoring with Dimension Tiling and Fused Product Quantization
    Ashutosh Sharma
    arXiv, 2026
    πŸ“„ paper | πŸ› οΈ code

Software Libraries

Training & Inference Frameworks

  • ColBERT Python
    Reference implementation for ColBERT and ColBERTv2, and includes PLAID support for efficient late-interaction retrieval.

  • RAGatouille Python
    Python toolkit to train and serve ColBERT-based late-interaction retrievers.

  • PyLate Python
    Python library for training, fine-tuning, inference, and retrieval with ColBERT-style late-interaction models on single and multi-GPU setups.

  • PyLate-rs Rust Python
    High-performance Rust inference engine for PyLate models, with Python bindings and optimized integration with FastPlaid for retrieval pipelines.

Retrieval Engines & Indexes

  • FastPlaid Python
    GPU-optimized engine for ColBERT/PLAID-style late-interaction retrieval.

  • kANNolo Rust Python
    ANN library for dense, sparse, and multivector retrieval.

  • Vectorium Rust
    Rust library for compact storage/access of dense, sparse, and multivector embeddings.

  • Firn Rust
    Rust search engine for single-vector and late-interaction multivector namespaces, backed by LanceDB on object storage with RAM/NVMe result caching.

  • NextPlaid Rust Python
    CPU-oriented local-first multivector retrieval engine with memory-mapped storage.

  • EMVB C++
    Reference implementation for Efficient Multi-Vector Dense Retrieval with Bit Vectors.

  • IGP C++
    Official C++ implementation for IGP: proximity-graph indexing for multi-vector retrieval (with Python scripts for experiments).

  • WARP Python
    Official implementation for WARP, an efficient multi-vector retrieval engine.

  • ColGrep Rust Python
    High-performance code search CLI tool powered by LateOn-Code and NextPlaid, enabling semantic + hybrid (regex + semantic) code retrieval locally with incremental indexing.

  • TACHIOM Rust Python
    Fast and scalable multivector retrieval system with Token-Aware Clustering (TAC) and hierarchical Product Quantization for efficient late-interaction search.

  • TopK Rust Python
    Managed retrieval engine with support for late-interaction search over billions of documents, online index updates, filtering, and more.

Scoring Kernels

  • Flash-MaxSim Python
    IO-aware Triton kernel for MaxSim scoring in ColBERT/ColPali pipelines: tile-by-tile on-chip computation with zero intermediate memory and INT8 quantization support.

  • maxsim Python
    Ahead-of-time compiled MaxSim kernel with CUDA and Metal backends (NVIDIA + Apple Silicon), distributed as a HuggingFace kernels package.

  • late-interaction-kernels Python
    Fused Triton kernels for MaxSim scoring with CUDA, Metal, and CPU backends, native PyLate/colpali-engine integration, and PLAID-style compressed-index support.

  • maxsim-cpu Rust Python
    CPU-only MaxSim kernel written in Rust (libxsmm on x86, Apple Accelerate on ARM) with Python bindings.

  • TileMaxSim Python
    IO-aware Triton kernel for MaxSim scoring with dimension tiling for embeddings wider than 128 dims and fused product quantization, achieving 80%+ peak HBM bandwidth.

Model Checkpoints

General-Purpose

  • colbert-ir/colbertv2.0
    Official ColBERTv2 checkpoint (MS MARCO-trained) from the ColBERT authors, widely used as the canonical baseline model.

  • lightonai/LateOn
    State-of-the-art ColBERT model (149M, ModernBERT-based) achieving 57.22 NDCG@10 on BEIR with fully open training data and strong generalization under decontamination.

  • lightonai/LateOn-regularized
    LateOn variant trained with STE-based regularization to fix compatibility with projection-based retrieval methods (MUVERA, SMVE).

  • ColBERT-Zero
    Large-scale fully pre-trained ColBERT checkpoint trained on public data and released with the ColBERT-Zero paper.

  • GTE-ModernColBERT-v1
    PyLate late-interaction checkpoint based on ModernBERT with 128-dimensional token embeddings and strong long-context retrieval behavior.

  • Iso-ModernColBERT
    Isotropically corrected version of GTE-ModernColBERT-v1 built for efficient inference and scalable retrieval.

  • colberter-128-32-msmarco / uni-colberter-128-1-msmarco
    ColBERTer checkpoints trained on MS MARCO (128-dim, with 32 and 1 unique whole-word vectors per document respectively).

Specialized / Domain

  • lightonai/LateOn-Code
    Specialized ColBERT model (149M parameters) fine-tuned for code retrieval, achieving SOTA on MTEB Code benchmark.

  • lightonai/LateOn-Code-edge
    Lightweight code retrieval model (17M parameters) for edge devices, matching larger models while running efficiently on CPU.

  • Reason-ModernColBERT
    Reasoning-focused late-interaction checkpoint fine-tuned on reasonir-hq, with strong BRIGHT benchmark performance for reasoning-intensive retrieval.

Datasets and Encodings

MS MARCO v1

  • Documents: 8,841,823
  • Queries [dev.small]: 6,980
  • Reference Metric: MRR@10
Encoding Link Vector dim Avg vectors per doc Avg vectors per query MRR@10
colbertv2 link 128 67 32 0.397

LoTTE-pooled

  • Documents: 2,428,854
  • Queries [dev/search]: 2,931
  • Reference Metric: Success@5
Encoding Link Vector dim Avg vectors per doc Avg vectors per query Success@5
colbertv2 link 128 109 32 N/A

Multimedia Resources

  • Omar Khattab on Late Interaction in 2030. Link
  • Multi-Vector Search with AmΓ©lie Chatelain and Antoine Chaffin - Weaviate Podcast #134. Link

About

An extensive and commented list of resources on Late-Interaction Multivector Retrieval.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages