-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning & Data Science Supervised Learning Locality Sensitive Hashing
Locality Sensitive Hashing (LSH) is a method used for approximate K-Nearest-Neighbours search, which allows for the efficient retrieval of similar items from large datasets. LSH works particularly well in high-dimensional spaces where traditional methods can be computationally prohibitive.
The fundamental idea behind LSH is to hash input items in such a way that similar items are more likely to be hashed to the same bucket, while dissimilar items are more likely to be hashed to different buckets.
One way to do this is to create a set of planes which divide up your dataset. Each plane is described by it's normal vector
$$ \begin{aligned} P_{1}\cdot v = 3, h_{1}=1 \
P_{2}\cdot v = -3, h_{1}=0 \
P_{3}\cdot v = 3, h_{1}=1 \
\end{aligned}
$$

These signs (0 or 1) are then multiplied by exponentially increasing numbers and summed to produce a hash value. $$ hash = 2^{0}\times h_{1}+2^{1}\times h_{2}+ 2^{2}\times h_{3} $$
Formally: $$ hash = \sum\limits_{i}^{H}2^{i}\times h_{i} $$