Search through your file system using natural language. Extends the search to your file contents, allowing for more precise and accurate search results. The best part, it all runs locally on your machine, so none of your data leaves your device!
- Upon startup, the application runs a background process to index all the files in your desired directory into a local ChromaDb vector database. (Currently, the directory being used is
/test. This includes a variety of different - The files are split into tables based on their metadata and contents, and each of those are embedded using the ChromaDB Embedding API. The file is hashed and used as a key for each entry
- There is a separate observer thread running in the background that uses the Watchdog library to observe the file system and detect when any files are created, updated, moved, or deleted. This will then invoke the indexer class to either index, reindex, or delete the file entry.
- When a user enters a query (ex, "stories about space"), it gets vector embedded using the same ChromaDB embedding API, and then is run through a cosine similarity search over both the metadata and content tables
- The results are then scored using a weighted average and normalization algorithm that is optimized to maximize matches.
- Cosine similarity returns a score in the range [0, 2] where 0 = perfectly similar, 1 = orthogonal, 2 = perfectly dissimilar. Using an inverted sigmoid function that is scaled and shifted to match the general range of the data, this results in optimal weighing for matches.
- The results are ranked and sorted in descending order based on the % match.
- Create a venv to run using
python3.12 -m venv venv - Activate the venv Windows:
venv\Scripts\activate, Mac:source venv/bin/activate - Install all the dependencies using
pip install -r requirements.txt - If you want, you can change the root directory indexing directory in
main.py - Run the application
python3 main.py - Once all the files have been indexed, you can type to search through your files using natural language, and it will return the top
kresults. - Enjoy!
The project includes a comprehensive test suite located in the tests/ directory.
python tests/test_search_simple.pypython tests/test_search_generation.pypython run_all_tests.pytests/test_search_simple.py- Simple synchronous tests for debuggingtests/test_search_generation.py- Comprehensive async test suite with statisticstests/test_api.py- Basic API endpoint teststests/run_tests.py- Enhanced test runner with colored output
See tests/README.md for detailed testing documentation.
