Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 1.51 KB

readme.md

File metadata and controls

34 lines (24 loc) · 1.51 KB

Notes Search

The plan is to use the sentence transformers to create embeddings that we can later look up with the annoy nearest neighbor search to find relevant files. We might need to implement some sort of caching embeddings with maybe sqlite later to speed up startup.

Offline usage

optionally make a model directory and in the model directory git clone the embedding model and cross encoder model.

The folder structure (tree -d) output should look like models/ ├── msmarco-distilbert-base-tas-b │   └── 1_Pooling └── ms-marco-TinyBERT-L-2

Remember to "git-lfs pull" after git cloning to get the model files. The main.py automatically checks these two folders before trying to load the models from the internet

Usage

Usage: source venv/bin/activate

First create the embeddings for the notes and the annoy tree with the following: python3 main.py build (notes-dir) (data directory name) ex. python3 main.py build ~/Notes notes

Do semantic search on the notes with the following: python3 main.py search "(query string) (data directory name)" ex. python3 main.py search "ssh" notes

Useful references

Annoy usage

Sentence transformer usage

ReRanking