Awesome Multivector Retrieval

An extensive and commented list of resources on late-interaction multivector retrieval.

Models

Foundational Models

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab, Matei Zaharia
SIGIR, 2020
📄 paper | 🛠️ code
COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List
Luyu Gao, Zhuyun Dai, Jamie Callan
NAACL, 2021
📄 paper
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia
NAACL, 2022
📄 paper | 🛠️ code
Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
Rajesh Jayaram
arXiv, 2026
📄 paper

General Models & Training

PyLate: Flexible Training and Retrieval for Late Interaction Models
Antoine Chaffin, Raphaël Sourty
CIKM, 2025
📄 paper | 🛠️ code
ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models
Antoine Chaffin, Luca Arnaboldi, Amélie Chatelain, Florent Krzakala
arXiv, 2026
📄 paper
A Replicability Study of XTR
Rohan Jha, Reno Kriz, Benjamin Van Durme
arXiv, 2026
📄 paper
Your Embedding Model is SMARTer Than You Think
Jianrui Zhang, Hyun Jung Lee, Sukanta Ganguly, Tae-Eui Kam, Donghyun Kim, Yong Jae Lee
arXiv, 2026
📄 paper | 🛠️ code
Party is over: regularizing ColBERT models to fix efficient ANN methods
LightOn AI
Blog, 2026
📝 blog

Compression & Token Pruning

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
Sebastian Hofstatter, Omar Khattab, Sophia Althammer, Mete Sertkan, Allan Hanbury
CIKM, 2022
📄 paper | 🛠️ code
Joint Optimization of Multi-Vector Representation with Product Quantization
Yufan Fang, Jing Zhan, Yiqun Liu, Jiafeng Mao, Min Zhang, Shaoping Ma
NLPCC, 2022
📄 paper
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
Minghan Li, Sean C. Lin, Barlas Oguz, Arnab Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
ACL, 2023
📄 paper
SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin
SIGIR, 2023
📄 paper
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao
NeurIPS, 2023
📄 paper
SPLATE: Sparse Late Interaction Retrieval
Thibault Formal, Stephane Clinchant, Herve Dejean, Carlos Lassance
SIGIR, 2024
📄 paper
Muvera: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, Vahab Mirrokni
NeurIPS, 2024
📄 paper
Enhancing ColBERT: A Method for Reducing Space Complexity and Accelerating Retrieval Speed
Hai Nguyen T., Huong Le T.
PACLIC, 2024
📄 paper
Token Pruning Optimization for Efficient Multi-vector Dense Retrieval
Shanxiu He, Mutasem Al-Darabsah, Suraj Nair, Jonathan May, Tarun Agarwal, Tao Yang, Choon Hui Teo
ECIR, 2025
📄 paper
CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
João Veneroso, Rajesh Jayaram, Jinmeng Rao, Gustavo Hernández Ábrego, Majid Hadian, Daniel Cer
arXiv, 2025
📄 paper
Towards Lossless Token Pruning in Late-Interaction Retrieval Models
Yuxuan Zong, Benjamin Piwowarski
SIGIR, 2025
📄 paper
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
Yibo Yan, Mingdong Ou, Yi Cao, Xin Zou, Jiahao Huo, Shuliang Liu, James Kwok, Xuming Hu
arXiv, 2026
📄 paper
Multi-Vector Index Compression in Any Modality
Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
arXiv, 2026
📄 paper | 🛠️ code
A Brief Comparison of Training-Free Multi-Vector Sequence Compression Methods
Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
ECIR (LIR Workshop), 2026
📄 paper

Multimodal & Vision

ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Celine Hudelot, Pierre Colombo
ICLR, 2025
📄 paper
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa
CVPR, 2025
📄 paper
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Ahmed Masry, Megh Thakkar, Patrice Bechard, Sathwik Tejaswi Madhusudhan, Rabiul Awal, Shambhavi Mishra, Akshay Kalkunte Suresh, Srivatsava Daruru, Enamul Hoque, Spandana Gella, Torsten Scholak, Sai Rajeswar
EMNLP, 2025
📄 paper

Retrieval

Indexing & Search Algorithms

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
Omar Khattab, Christopher Potts, Matei Zaharia
NeurIPS, 2021
📄 paper
PLAID: An Efficient Engine for Late Interaction Retrieval
Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia
CIKM, 2022
📄 paper | 🛠️ code
DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava
NeurIPS, 2023
📄 paper
Efficient Multi-Vector Dense Retrieval with Bit Vectors
Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2024
📄 paper | 🛠️ code
A Reproducibility Study of PLAID
Sean MacAvaney, Nicola Tonellotto
SIGIR, 2024
📄 paper
Efficient Constant-Space Multi-vector Retrieval
Sean MacAvaney, Antonio Mallia, Nicola Tonellotto
ECIR, 2025
📄 paper
IGP: Efficient Multi-Vector Retrieval via Proximity Graph Index
Ziyang Bian, Man Lung Yiu, Buzhou Tang
SIGIR, 2025
📄 paper | 🛠️ code
WARP: An Efficient Engine for Multi-Vector Retrieval
Joel L. Scheerer, Matei Zaharia, Christopher Potts, Gustavo Alonso, Omar Khattab
SIGIR, 2025
📄 paper | 🛠️ code
Multivector Reranking in the Era of Strong First-Stage Retrievers
Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2026
📄 paper | 🛠️ code | 🛠️ code
SMVE: Sparse Multi-Vector Retrieval
Martin Spisak, Marek Galovic
ECIR, 2026
📄 blog
No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval
Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You
ICML, 2026
📄 paper
LEMUR: Learned Multi-Vector Retrieval
Elias Jääsaari, Ville Hyvönen, Teemu Roos
ICML, 2026
📄 paper | 🛠️ code
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
SIGIR, 2026
📄 paper | 🛠️ code
ColBERTSaR: Sparsified ColBERT Index via Product Quantization
Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield, Saron Samuel, Rohan Jha
SIGIR, 2026
📄 paper | 🛠️ code

Scoring Kernels

FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Scoring
Roi Pony, Adi Raz Goldfarb, Idan Friedman, Daniel Ezer, Udi Barzelay
arXiv, 2026
📄 paper | 🛠️ code
TileMaxSim: IO-Aware GPU MaxSim Scoring with Dimension Tiling and Fused Product Quantization
Ashutosh Sharma
arXiv, 2026
📄 paper | 🛠️ code

Software Libraries

Training & Inference Frameworks

ColBERT
Reference implementation for ColBERT and ColBERTv2, and includes PLAID support for efficient late-interaction retrieval.
RAGatouille
Python toolkit to train and serve ColBERT-based late-interaction retrievers.
PyLate
Python library for training, fine-tuning, inference, and retrieval with ColBERT-style late-interaction models on single and multi-GPU setups.
PyLate-rs
High-performance Rust inference engine for PyLate models, with Python bindings and optimized integration with FastPlaid for retrieval pipelines.

Retrieval Engines & Indexes

FastPlaid
GPU-optimized engine for ColBERT/PLAID-style late-interaction retrieval.
kANNolo
ANN library for dense, sparse, and multivector retrieval.
Vectorium
Rust library for compact storage/access of dense, sparse, and multivector embeddings.
Firn
Rust search engine for single-vector and late-interaction multivector namespaces, backed by LanceDB on object storage with RAM/NVMe result caching.
NextPlaid
CPU-oriented local-first multivector retrieval engine with memory-mapped storage.
EMVB
Reference implementation for Efficient Multi-Vector Dense Retrieval with Bit Vectors.
IGP
Official C++ implementation for IGP: proximity-graph indexing for multi-vector retrieval (with Python scripts for experiments).
WARP
Official implementation for WARP, an efficient multi-vector retrieval engine.
ColGrep
High-performance code search CLI tool powered by LateOn-Code and NextPlaid, enabling semantic + hybrid (regex + semantic) code retrieval locally with incremental indexing.
TACHIOM
Fast and scalable multivector retrieval system with Token-Aware Clustering (TAC) and hierarchical Product Quantization for efficient late-interaction search.
TopK
Managed retrieval engine with support for late-interaction search over billions of documents, online index updates, filtering, and more.

Scoring Kernels

Flash-MaxSim
IO-aware Triton kernel for MaxSim scoring in ColBERT/ColPali pipelines: tile-by-tile on-chip computation with zero intermediate memory and INT8 quantization support.
maxsim
Ahead-of-time compiled MaxSim kernel with CUDA and Metal backends (NVIDIA + Apple Silicon), distributed as a HuggingFace kernels package.
late-interaction-kernels
Fused Triton kernels for MaxSim scoring with CUDA, Metal, and CPU backends, native PyLate/colpali-engine integration, and PLAID-style compressed-index support.
maxsim-cpu
CPU-only MaxSim kernel written in Rust (libxsmm on x86, Apple Accelerate on ARM) with Python bindings.
TileMaxSim
IO-aware Triton kernel for MaxSim scoring with dimension tiling for embeddings wider than 128 dims and fused product quantization, achieving 80%+ peak HBM bandwidth.

Model Checkpoints

General-Purpose

colbert-ir/colbertv2.0
Official ColBERTv2 checkpoint (MS MARCO-trained) from the ColBERT authors, widely used as the canonical baseline model.
lightonai/LateOn
State-of-the-art ColBERT model (149M, ModernBERT-based) achieving 57.22 NDCG@10 on BEIR with fully open training data and strong generalization under decontamination.
lightonai/LateOn-regularized
LateOn variant trained with STE-based regularization to fix compatibility with projection-based retrieval methods (MUVERA, SMVE).
ColBERT-Zero
Large-scale fully pre-trained ColBERT checkpoint trained on public data and released with the ColBERT-Zero paper.
GTE-ModernColBERT-v1
PyLate late-interaction checkpoint based on ModernBERT with 128-dimensional token embeddings and strong long-context retrieval behavior.
Iso-ModernColBERT
Isotropically corrected version of GTE-ModernColBERT-v1 built for efficient inference and scalable retrieval.
colberter-128-32-msmarco / uni-colberter-128-1-msmarco
ColBERTer checkpoints trained on MS MARCO (128-dim, with 32 and 1 unique whole-word vectors per document respectively).

Specialized / Domain

lightonai/LateOn-Code
Specialized ColBERT model (149M parameters) fine-tuned for code retrieval, achieving SOTA on MTEB Code benchmark.
lightonai/LateOn-Code-edge
Lightweight code retrieval model (17M parameters) for edge devices, matching larger models while running efficiently on CPU.
Reason-ModernColBERT
Reasoning-focused late-interaction checkpoint fine-tuned on reasonir-hq, with strong BRIGHT benchmark performance for reasoning-intensive retrieval.

Datasets and Encodings

`MS MARCO v1`

Documents: 8,841,823
Queries [dev.small]: 6,980
Reference Metric: MRR@10

Encoding	Link	Vector dim	Avg vectors per doc	Avg vectors per query	MRR@10
`colbertv2`	link	128	67	32	0.397

`LoTTE-pooled`

Documents: 2,428,854
Queries [dev/search]: 2,931
Reference Metric: Success@5

Encoding	Link	Vector dim	Avg vectors per doc	Avg vectors per query	Success@5
`colbertv2`	link	128	109	32	`N/A`

Multimedia Resources

Omar Khattab on Late Interaction in 2030. Link
Multi-Vector Search with Amélie Chatelain and Antoine Chaffin - Weaviate Podcast #134. Link

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
images		images
LICENSE		LICENSE
README.md		README.md
biblio.bib		biblio.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Multivector Retrieval

Contents

Models

Foundational Models

General Models & Training

Compression & Token Pruning

Multimodal & Vision

Retrieval

Indexing & Search Algorithms

Scoring Kernels

Software Libraries

Training & Inference Frameworks

Retrieval Engines & Indexes

Scoring Kernels

Model Checkpoints

General-Purpose

Specialized / Domain

Datasets and Encodings

`MS MARCO v1`

`LoTTE-pooled`

Multimedia Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Multivector Retrieval

Contents

Models

Foundational Models

General Models & Training

Compression & Token Pruning

Multimodal & Vision

Retrieval

Indexing & Search Algorithms

Scoring Kernels

Software Libraries

Training & Inference Frameworks

Retrieval Engines & Indexes

Scoring Kernels

Model Checkpoints

General-Purpose

Specialized / Domain

Datasets and Encodings

MS MARCO v1

LoTTE-pooled

Multimedia Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`MS MARCO v1`

`LoTTE-pooled`

Packages