Skip to content

Add two new "Seeded" Knn queries for seeded vector search #14084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jan 15, 2025

Conversation

benwtrent
Copy link
Member

This is a continuation of #13635

Description

This PR addresses #13634.

The main changes are in:

  • A new "seeded" focused knn collector and collector manager
  • Two new basic knn queries that expose using these specialized collectors for seeded entrypoint
  • HnswGraphSearcher, which bypasses the findBestEntryPoint step if seeds are provided.

//cc @seanmacavaney

Sean MacAvaney and others added 30 commits August 6, 2024 11:26
@benwtrent benwtrent changed the title Add AbstractKnnVectorQuery.seed for seeded HNSW Add two new "Seeded" Knn queries for seeded vector search Dec 20, 2024
Copy link
Contributor

@cpoerschke cpoerschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice continuation, thanks for pursuing this!

13 / 16 files viewed

@benwtrent
Copy link
Member Author

benwtrent commented Jan 7, 2025

@seanmacavaney @cpoerschke I am gonna merge this in the next couple of days. I flagged the queries and such as experimental if we want to change the interface. But I think it reached a really nice place and is ready for folks to start kicking the tires in more real situations :)

@seanmacavaney
Copy link
Contributor

Looks great, thanks @benwtrent! I'm keen to benchmark it a bit, but no need to hold up merging it over that.

@benwtrent benwtrent added this to the 10.2.0 milestone Jan 15, 2025
@benwtrent benwtrent merged commit 34f0453 into apache:main Jan 15, 2025
5 checks passed
@benwtrent benwtrent deleted the seeds branch January 15, 2025 14:08
benwtrent added a commit that referenced this pull request Jan 15, 2025
### Description

In some vector search cases, users may already know some documents that are likely related to a query. Let's support seeding HNSW's scoring stage with these documents, rather than using HNSW's hierarchical stage.

An example use case is hybrid search, where both a traditional and vector search are performed. The top results from the traditional search are likely reasonable seeds for the vector search. Even when not performing hybrid search, traditional matching can often be faster than traversing the hierarchy, which can be used to speed up the vector search process (up to 2x faster for the same effectiveness), as was demonstrated in [this article](https://arxiv.org/abs/2307.16779) (full disclosure: seanmacavaney is an author of the article).

The main changes are:
 - A new "seeded" focused knn collector and collector manager
 - Two new basic knn queries that expose using these specialized collectors for seeded entrypoint
 - `HnswGraphSearcher`, which bypasses the `findBestEntryPoint` step if seeds are provided.


//cc @seanmacavaney

Co-authored-by: Sean MacAvaney <[email protected]>
Co-authored-by: Sean MacAvaney <[email protected]>
Co-authored-by: Christine Poerschke <[email protected]>
benwtrent added a commit to benwtrent/lucene that referenced this pull request Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants