An extensive and commented list of resources on late-interaction multivector retrieval.
- Awesome Multivector Retrieval
-
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab, Matei Zaharia
SIGIR, 2020
π paper | π οΈ code -
COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List
Luyu Gao, Zhuyun Dai, Jamie Callan
NAACL, 2021
π paper -
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia
NAACL, 2022
π paper | π οΈ code -
Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
Rajesh Jayaram
arXiv, 2026
π paper
-
PyLate: Flexible Training and Retrieval for Late Interaction Models
Antoine Chaffin, RaphaΓ«l Sourty
CIKM, 2025
π paper | π οΈ code -
ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models
Antoine Chaffin, Luca Arnaboldi, AmΓ©lie Chatelain, Florent Krzakala
arXiv, 2026
π paper -
A Replicability Study of XTR
Rohan Jha, Reno Kriz, Benjamin Van Durme
arXiv, 2026
π paper -
Your Embedding Model is SMARTer Than You Think
Jianrui Zhang, Hyun Jung Lee, Sukanta Ganguly, Tae-Eui Kam, Donghyun Kim, Yong Jae Lee
arXiv, 2026
π paper | π οΈ code -
Party is over: regularizing ColBERT models to fix efficient ANN methods
LightOn AI
Blog, 2026
π blog
-
Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
Sebastian Hofstatter, Omar Khattab, Sophia Althammer, Mete Sertkan, Allan Hanbury
CIKM, 2022
π paper | π οΈ code -
Joint Optimization of Multi-Vector Representation with Product Quantization
Yufan Fang, Jing Zhan, Yiqun Liu, Jiafeng Mao, Min Zhang, Shaoping Ma
NLPCC, 2022
π paper -
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
Minghan Li, Sean C. Lin, Barlas Oguz, Arnab Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
ACL, 2023
π paper -
SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin
SIGIR, 2023
π paper -
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao
NeurIPS, 2023
π paper -
SPLATE: Sparse Late Interaction Retrieval
Thibault Formal, Stephane Clinchant, Herve Dejean, Carlos Lassance
SIGIR, 2024
π paper -
Muvera: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, Vahab Mirrokni
NeurIPS, 2024
π paper -
Enhancing ColBERT: A Method for Reducing Space Complexity and Accelerating Retrieval Speed
Hai Nguyen T., Huong Le T.
PACLIC, 2024
π paper -
Token Pruning Optimization for Efficient Multi-vector Dense Retrieval
Shanxiu He, Mutasem Al-Darabsah, Suraj Nair, Jonathan May, Tarun Agarwal, Tao Yang, Choon Hui Teo
ECIR, 2025
π paper -
CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
JoΓ£o Veneroso, Rajesh Jayaram, Jinmeng Rao, Gustavo HernΓ‘ndez Γbrego, Majid Hadian, Daniel Cer
arXiv, 2025
π paper -
Towards Lossless Token Pruning in Late-Interaction Retrieval Models
Yuxuan Zong, Benjamin Piwowarski
SIGIR, 2025
π paper -
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
Yibo Yan, Mingdong Ou, Yi Cao, Xin Zou, Jiahao Huo, Shuliang Liu, James Kwok, Xuming Hu
arXiv, 2026
π paper -
Multi-Vector Index Compression in Any Modality
Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
arXiv, 2026
π paper | π οΈ code -
A Brief Comparison of Training-Free Multi-Vector Sequence Compression Methods
Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
ECIR (LIR Workshop), 2026
π paper
-
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Celine Hudelot, Pierre Colombo
ICLR, 2025
π paper -
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa
CVPR, 2025
π paper -
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Ahmed Masry, Megh Thakkar, Patrice Bechard, Sathwik Tejaswi Madhusudhan, Rabiul Awal, Shambhavi Mishra, Akshay Kalkunte Suresh, Srivatsava Daruru, Enamul Hoque, Spandana Gella, Torsten Scholak, Sai Rajeswar
EMNLP, 2025
π paper
-
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
Omar Khattab, Christopher Potts, Matei Zaharia
NeurIPS, 2021
π paper -
PLAID: An Efficient Engine for Late Interaction Retrieval
Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia
CIKM, 2022
π paper | π οΈ code -
DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava
NeurIPS, 2023
π paper -
Efficient Multi-Vector Dense Retrieval with Bit Vectors
Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2024
π paper | π οΈ code -
A Reproducibility Study of PLAID
Sean MacAvaney, Nicola Tonellotto
SIGIR, 2024
π paper -
Efficient Constant-Space Multi-vector Retrieval
Sean MacAvaney, Antonio Mallia, Nicola Tonellotto
ECIR, 2025
π paper -
IGP: Efficient Multi-Vector Retrieval via Proximity Graph Index
Ziyang Bian, Man Lung Yiu, Buzhou Tang
SIGIR, 2025
π paper | π οΈ code -
WARP: An Efficient Engine for Multi-Vector Retrieval
Joel L. Scheerer, Matei Zaharia, Christopher Potts, Gustavo Alonso, Omar Khattab
SIGIR, 2025
π paper | π οΈ code -
Multivector Reranking in the Era of Strong First-Stage Retrievers
Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2026
π paper | π οΈ code | π οΈ code -
SMVE: Sparse Multi-Vector Retrieval
Martin Spisak, Marek Galovic
ECIR, 2026
π blog -
No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval
Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You
ICML, 2026
π paper -
LEMUR: Learned Multi-Vector Retrieval
Elias JÀÀsaari, Ville Hyvânen, Teemu Roos
ICML, 2026
π paper | π οΈ code -
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
SIGIR, 2026
π paper | π οΈ code -
ColBERTSaR: Sparsified ColBERT Index via Product Quantization
Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield, Saron Samuel, Rohan Jha
SIGIR, 2026
π paper | π οΈ code
-
FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Scoring
Roi Pony, Adi Raz Goldfarb, Idan Friedman, Daniel Ezer, Udi Barzelay
arXiv, 2026
π paper | π οΈ code -
TileMaxSim: IO-Aware GPU MaxSim Scoring with Dimension Tiling and Fused Product Quantization
Ashutosh Sharma
arXiv, 2026
π paper | π οΈ code
-
ColBERT
Reference implementation for ColBERT and ColBERTv2, and includes PLAID support for efficient late-interaction retrieval. -
RAGatouille
Python toolkit to train and serve ColBERT-based late-interaction retrievers. -
PyLate
Python library for training, fine-tuning, inference, and retrieval with ColBERT-style late-interaction models on single and multi-GPU setups. -
PyLate-rs
High-performance Rust inference engine for PyLate models, with Python bindings and optimized integration with FastPlaid for retrieval pipelines.
-
FastPlaid
GPU-optimized engine for ColBERT/PLAID-style late-interaction retrieval. -
kANNolo
ANN library for dense, sparse, and multivector retrieval. -
Vectorium

Rust library for compact storage/access of dense, sparse, and multivector embeddings. -
Firn

Rust search engine for single-vector and late-interaction multivector namespaces, backed by LanceDB on object storage with RAM/NVMe result caching. -
NextPlaid
CPU-oriented local-first multivector retrieval engine with memory-mapped storage. -
EMVB
Reference implementation for Efficient Multi-Vector Dense Retrieval with Bit Vectors. -
IGP
Official C++ implementation for IGP: proximity-graph indexing for multi-vector retrieval (with Python scripts for experiments). -
WARP
Official implementation for WARP, an efficient multi-vector retrieval engine. -
ColGrep
High-performance code search CLI tool powered by LateOn-Code and NextPlaid, enabling semantic + hybrid (regex + semantic) code retrieval locally with incremental indexing. -
TACHIOM
Fast and scalable multivector retrieval system with Token-Aware Clustering (TAC) and hierarchical Product Quantization for efficient late-interaction search. -
TopK
Managed retrieval engine with support for late-interaction search over billions of documents, online index updates, filtering, and more.
-
Flash-MaxSim
IO-aware Triton kernel for MaxSim scoring in ColBERT/ColPali pipelines: tile-by-tile on-chip computation with zero intermediate memory and INT8 quantization support. -
maxsim
Ahead-of-time compiled MaxSim kernel with CUDA and Metal backends (NVIDIA + Apple Silicon), distributed as a HuggingFace kernels package. -
late-interaction-kernels
Fused Triton kernels for MaxSim scoring with CUDA, Metal, and CPU backends, native PyLate/colpali-engine integration, and PLAID-style compressed-index support. -
maxsim-cpu
CPU-only MaxSim kernel written in Rust (libxsmm on x86, Apple Accelerate on ARM) with Python bindings. -
TileMaxSim
IO-aware Triton kernel for MaxSim scoring with dimension tiling for embeddings wider than 128 dims and fused product quantization, achieving 80%+ peak HBM bandwidth.
-
colbert-ir/colbertv2.0
Official ColBERTv2 checkpoint (MS MARCO-trained) from the ColBERT authors, widely used as the canonical baseline model. -
lightonai/LateOn
State-of-the-art ColBERT model (149M, ModernBERT-based) achieving 57.22 NDCG@10 on BEIR with fully open training data and strong generalization under decontamination. -
lightonai/LateOn-regularized
LateOn variant trained with STE-based regularization to fix compatibility with projection-based retrieval methods (MUVERA, SMVE). -
ColBERT-Zero
Large-scale fully pre-trained ColBERT checkpoint trained on public data and released with the ColBERT-Zero paper. -
GTE-ModernColBERT-v1
PyLate late-interaction checkpoint based on ModernBERT with 128-dimensional token embeddings and strong long-context retrieval behavior. -
Iso-ModernColBERT
Isotropically corrected version of GTE-ModernColBERT-v1 built for efficient inference and scalable retrieval. -
colberter-128-32-msmarco / uni-colberter-128-1-msmarco
ColBERTer checkpoints trained on MS MARCO (128-dim, with 32 and 1 unique whole-word vectors per document respectively).
-
lightonai/LateOn-Code
Specialized ColBERT model (149M parameters) fine-tuned for code retrieval, achieving SOTA on MTEB Code benchmark. -
lightonai/LateOn-Code-edge
Lightweight code retrieval model (17M parameters) for edge devices, matching larger models while running efficiently on CPU. -
Reason-ModernColBERT
Reasoning-focused late-interaction checkpoint fine-tuned on reasonir-hq, with strong BRIGHT benchmark performance for reasoning-intensive retrieval.
- Documents:
8,841,823 - Queries [
dev.small]:6,980 - Reference Metric:
MRR@10
| Encoding | Link | Vector dim | Avg vectors per doc | Avg vectors per query | MRR@10 |
|---|---|---|---|---|---|
colbertv2 |
link | 128 | 67 | 32 | 0.397 |
- Documents:
2,428,854 - Queries [
dev/search]:2,931 - Reference Metric:
Success@5
| Encoding | Link | Vector dim | Avg vectors per doc | Avg vectors per query | Success@5 |
|---|---|---|---|---|---|
colbertv2 |
link | 128 | 109 | 32 | N/A |