Skip to content

Commit 62d0e5a

Browse files
alibeklfcfacebook-github-bot
authored andcommitted
RaBitQ Fast Scan (#4595)
Summary: **Introduction** This diff adds a new index called the IndexRaBitQFastScan algorithm. The algorithm is based on the existing IndexRaBitQ but achieves higher speed as it processes batches of 32 data vectors concurrently. It leverages the established IndexFastScan architecture to enable efficient batch processing and parallelism. **Implementation** * **New Source and Header Files**: Added implementations for IndexRaBitQFastScan, following a similar interface to IndexRaBitQ. * **Batched Processing**: The search operation processes multiple (32) data vectors in a single batch, taking advantage of low-level parallelism to improve throughput. * **Specialized Post-processing Handler**: A dedicated handler was added for IndexRaBitQFastScan to perform necessary post-processing during search because the LUT accumulates only partial distances. Unlike AQ Fast Scan's simple scalar post-processing, RaBitQ requires complex distance adjustments depending on both query and database vector factors. * **LUT**: IndexRaBitQFastScan produces slightly different results than IndexRaBitQ due to an extra quantization step in the IndexFastScan architecture. Specifically: * The LUT computes a float value as c1 * inner_product + c2 * popcount, which is then quantized. This quantization can cause the results to differ slightly from those of IndexRaBitQ. * It is possible to avoid this by storing only the inner_product in the LUT, but doing so would require calculating all data vector popcounts during search, introducing a tradeoff between speed and accuracy. * With the idea proposed in diff D80904214, the algorithm can be modified in the future to eliminate the popcount calculation step, potentially improving both efficiency and accuracy. * **Query Offset Parameter**: RaBitQ uses query factors in distance calculations that should be computed in `compute_float_LUT` method (the most efficient place since we are calculating `rotated_qq` anyways) and used for final distance calculations in handlers. However, the previous version of `compute_quantized_LUT` that calls `compute_float_LUT` did not know the query_offset, preventing proper storage of query factors at their global indices. To solve this, I added the extra parameter `query_offset` to both `compute_quantized_LUT` and `compute_float_LUT` methods. After this change, computed query factors can be accessed by the correct global query index during distance calculations, avoiding expensive recalculation. **Testing** * Conducted comprehensive tests in the test_rabitq suite covering accuracy comparisons with IndexRaBitQ for L2 and Inner Product metrics, encoding/decoding consistency, query quantization bit settings, small dataset functionality, performance against PQFastScan, serialization, memory management, error handling, and thread safety. * All tests passed successfully, validating the correctness and robustness of IndexRaBitQFastScan. **Results** results_rabitq * **Performance Dependency**: Performance measurements confirm that IndexRaBitQFastScan is notably faster than IndexRaBitQ when the qb value is high. While the original IndexRaBitQ experiences increased runtime with higher qb values, the fast scan variant maintains consistent runtime regardless of qb. * **Parallelized Training Loop**: The training loop is parallelized, greatly reducing training time. This parallelism should also be added to the original IndexRaBitQ. * **Consistency Across Metrics**: The performance advantages of IndexRaBitQFastScan hold true for both L2 and Inner Product metrics, demonstrating robustness across different distance measures. * One of the next steps is to benchmark IndexRaBitQFastScan against other algorithms to evaluate its performance in a broader context. Differential Revision: D81787307
1 parent 752832c commit 62d0e5a

18 files changed

+1772
-219
lines changed

faiss/CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ set(FAISS_SRC
4141
IndexPQFastScan.cpp
4242
IndexPreTransform.cpp
4343
IndexRaBitQ.cpp
44+
IndexRaBitQFastScan.cpp
4445
IndexRefine.cpp
4546
IndexReplicas.cpp
4647
IndexRowwiseMinMax.cpp
@@ -63,6 +64,7 @@ set(FAISS_SRC
6364
impl/ProductQuantizer.cpp
6465
impl/AdditiveQuantizer.cpp
6566
impl/RaBitQuantizer.cpp
67+
impl/RaBitQUtils.cpp
6668
impl/ResidualQuantizer.cpp
6769
impl/LocalSearchQuantizer.cpp
6870
impl/ProductAdditiveQuantizer.cpp
@@ -141,6 +143,7 @@ set(FAISS_HEADERS
141143
IndexRefine.h
142144
IndexReplicas.h
143145
IndexRaBitQ.h
146+
IndexRaBitQFastScan.h
144147
IndexRowwiseMinMax.h
145148
IndexScalarQuantizer.h
146149
IndexShards.h
@@ -163,6 +166,7 @@ set(FAISS_HEADERS
163166
impl/LocalSearchQuantizer.h
164167
impl/ProductAdditiveQuantizer.h
165168
impl/LookupTableScaler.h
169+
impl/FastScanDistancePostProcessing.h
166170
impl/maybe_owned_vector.h
167171
impl/NNDescent.h
168172
impl/NSG.h
@@ -171,6 +175,7 @@ set(FAISS_HEADERS
171175
impl/ProductQuantizer.h
172176
impl/Quantizer.h
173177
impl/RaBitQuantizer.h
178+
impl/RaBitQUtils.h
174179
impl/ResidualQuantizer.h
175180
impl/ResultHandler.h
176181
impl/ScalarQuantizer.h

faiss/IndexAdditiveQuantizerFastScan.cpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include <memory>
1212

1313
#include <faiss/impl/FaissAssert.h>
14+
#include <faiss/impl/FastScanDistancePostProcessing.h>
1415
#include <faiss/impl/LocalSearchQuantizer.h>
1516
#include <faiss/impl/LookupTableScaler.h>
1617
#include <faiss/impl/ResidualQuantizer.h>
@@ -123,7 +124,8 @@ void IndexAdditiveQuantizerFastScan::estimate_norm_scale(
123124
}
124125

125126
std::vector<float> dis_tables(n * M * ksub);
126-
compute_float_LUT(dis_tables.data(), n, x);
127+
FastScanDistancePostProcessing empty_context;
128+
compute_float_LUT(dis_tables.data(), n, x, empty_context);
127129

128130
// here we compute the mean of scales for each query
129131
// TODO: try max of scales
@@ -153,7 +155,8 @@ void IndexAdditiveQuantizerFastScan::compute_codes(
153155
void IndexAdditiveQuantizerFastScan::compute_float_LUT(
154156
float* lut,
155157
idx_t n,
156-
const float* x) const {
158+
const float* x,
159+
const FastScanDistancePostProcessing&) const {
157160
if (metric_type == METRIC_INNER_PRODUCT) {
158161
aq->compute_LUT(n, x, lut, 1.0f);
159162
} else {
@@ -200,10 +203,12 @@ void IndexAdditiveQuantizerFastScan::search(
200203
}
201204

202205
NormTableScaler scaler(norm_scale);
206+
FastScanDistancePostProcessing context;
207+
context.norm_scaler = &scaler;
203208
if (metric_type == METRIC_L2) {
204-
search_dispatch_implem<true>(n, x, k, distances, labels, &scaler);
209+
search_dispatch_implem<true>(n, x, k, distances, labels, context);
205210
} else {
206-
search_dispatch_implem<false>(n, x, k, distances, labels, &scaler);
211+
search_dispatch_implem<false>(n, x, k, distances, labels, context);
207212
}
208213
}
209214

faiss/IndexAdditiveQuantizerFastScan.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,11 @@ struct IndexAdditiveQuantizerFastScan : IndexFastScan {
6262

6363
void compute_codes(uint8_t* codes, idx_t n, const float* x) const override;
6464

65-
void compute_float_LUT(float* lut, idx_t n, const float* x) const override;
65+
void compute_float_LUT(
66+
float* lut,
67+
idx_t n,
68+
const float* x,
69+
const FastScanDistancePostProcessing& context) const override;
6670

6771
void search(
6872
idx_t n,

0 commit comments

Comments
 (0)