-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning & Data Science Vector Search Product Quantisation
- Suppose you have millions of d-dimensional vectors (say
$d = 128$ ), and you want to store them compactly while still supporting approximate nearest-neighbour search. - You could use k-means quantisation directly:
- Find a large codebook of
$k$ centroids and store each vector by the index of its closest centroid. - But that would require an enormous codebook — e.g. for 128-D data, even
$k = 10^6$ might not be enough resolution.
- Find a large codebook of
Product Quantisation (PQ) solves this by factorising the space into smaller parts.
Instead of quantising the whole vector at once, PQ splits each vector into sub-vectors, and quantises each sub-vector separately.
- For a vector
$x \in \mathbb{R}^d$ :- Split
$x$ into$m$ equal parts:$x = [x_1, x_2, ..., x_m]$ where each$x_i \in \mathbb{R}^{d/m}$ .
- Split
- For each subspace
$i$ , learn a small codebook using k-means:-
$C_i = { c_{i1}, c_{i2}, ..., c_{iK} }$ where$K$ is typically 256 (so each subvector can be stored in 1 byte).
-
- Quantise each subvector by storing the index of its closest centroid:
$q_i(x_i) = argmin_j || x_i - c_{ij} ||^2$ - The overall quantised vector is then represented by:
$q(x) = [ q_1(x_1), q_2(x_2), ..., q_m(x_m) ]$ i.e., a sequence of m small indices.

This is why it’s called Product Quantisation, you’re effectively forming a Cartesian product of smaller codebooks

(ADC — Asymmetric Distance Computation) When querying, you don’t decompress all vectors, that would defeat the purpose. Instead:
- For a query vector
$q$ , split it into the same$m$ parts$q = [q_1, ..., q_m]$ . - Precompute a distance table for each subspace:
$$D_i[j] = || q_i - c_{ij} ||^2$$ - For a database vector with PQ code
$[k₁, k₂, ..., kₘ]$ , estimate the distance as:$$|| q - x ||^2 ≈ Σ_{i=1}^{m} D_i[k_i]$$ This avoids ever reconstructing the full vector and gives fast approximate distances.
Dimensionality reduction aims to produce a new set of lower dimensional vectors with the same scope S (e.g., 32-bits), while quantisation reduces the scope while maintaining the dimensionality D.


PQ alone gives compact representations, but you’d still have to search the entire dataset. By combining it with Machine-Learning-&-Data-Science-Vector-Search-IVF-(Inverted-File-Index):
- IVF clusters the space into coarse cells.
- Within each cell, PQ encodes only the residuals (fine detail) relative to the centroid: $$$r_i = x_i - c_j$$
- At query time, you probe a few cells (nprobe), then use PQ to compute approximate distances.