Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle plain SearchParameters in HNSW searches #4167

Closed
wants to merge 3 commits into from

Conversation

kaivalnp
Copy link

@kaivalnp kaivalnp commented Feb 4, 2025

Summary: Add ability to search HNSW indexes using a plain SearchParameters object (i.e. only an IDSelector)

Issue: Currently if a plain SearchParameters is used to query an HNSW index, an error is thrown -- when the user's intent was only to filter some documents, and rely on index settings for remaining parameters (like efSearch, check_relative_distance, search_bounded_queue)

Motivation: Faiss provides an amazing index factory and parameter setter to abstract away internals of the index type and settings used, like:

Index* index = index_factory(256, "HNSW32");
ParameterSpace().set_index_parameters(index, "efConstruction=200,efSearch=150");

Now if a user wants to perform a filtered search on this opaque index using:

SearchParameters parameters;
parameters.sel = new IDSelectorRange(10, 20);
index->search(nq, xq, k, d, id, &parameters);

they are met with an error:

faiss/IndexHNSW.cpp:251: Error: '!(params)' failed: params type invalid

An easy way to reproduce this issue is to replace Flat -> HNSW here and run example_c like:

make -C build example_c
./build/c_api/example_c

This PR allows passing a plain SearchParameters to HNSW indexes, and use index settings as a fallback

@mnorris11
Copy link

Hi @kaivalnp , I am still browsing through it, and reading the corresponding PR on Lucene side apache/lucene#14178

Meanwhile, could you update the format as suggested by https://github.com/facebookresearch/faiss/actions/runs/13143467156/job/36742197199?pr=4167 ? (I believe that is the only issue so far; the Linux x86_64 (conda) error is transient, and AMD folks are looking into the rocm runner)

@mnorris11
Copy link

If the caller knows they are creating an IndexHNSW through this factory string, I'm confused why they can't pass a SearchParametersHNSW? (Apologies, I didn't catch the context of how Faiss gets invoked in the Lucene PR very well.)

BTW, pre-filtering is of interest to us too. I'll keep reading your updates in apache/lucene#14178.

@kaivalnp
Copy link
Author

kaivalnp commented Feb 6, 2025

could you update the format

Done!

If the caller knows they are creating an IndexHNSW through this factory string
context of how Faiss gets invoked in the Lucene PR

The API currently exposed in the PR asks for two strings from the end-user: description and indexParams

It internally uses index_factory to create an index based on description and ParameterSpace().set_index_parameters on indexParams to set relevant indexing and search parameters

Only the end-user knows what string (and subsequently, index type) is going to be used -- so we'd have to parse the description inside Lucene (or the resultant Index inside Faiss) to try and "infer" the index type, and create specific parameters (SearchParametersIVF, SearchParametersPQ, SearchParametersHNSW etc) based on this type, which may be problematic

Further, the user has already specified some search parameters in indexParams, and we'd have to read these parameters (like efSearch, check_relative_distance, etc) from the IndexHNSW to instantiate a SearchParametersHNSW object (similarly other relevant parameters for other index types)

I'm also not sure about which SearchParameters variant to instantiate in case of complex indexes having vector transforms, quantization, refined indexes, etc where multiple types are applicable

All this may require non-trivial additional code in Lucene and / or Faiss. Please let me know if you had another approach in mind to tackle this issue (of applying only an IDSelector on an opaque Index)..

FYI we're trying to create a standalone Lucene codec that internally uses Faiss for vector indexing and search, without any additional "glue" code -- and it would be awesome if Faiss supports required functionality out-of-the-box! (as long as these changes are not too invasive)

@facebook-github-bot
Copy link
Contributor

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Feb 6, 2025
Summary:
Add ability to search HNSW indexes using a plain [`SearchParameters`](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/faiss/Index.h#L64-L69) object (i.e. only an [`IDSelector`](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/faiss/Index.h#L66))

Issue: Currently if a plain `SearchParameters` is used to query an HNSW index, [an error is thrown](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/faiss/IndexHNSW.cpp#L251) -- when the user's intent was only to filter some documents, and rely on index settings for remaining parameters (like `efSearch`, `check_relative_distance`, `search_bounded_queue`)

Motivation: Faiss provides an amazing [index factory](https://github.com/facebookresearch/faiss/wiki/The-index-factory) and [parameter setter](https://github.com/facebookresearch/faiss/wiki/Index-IO,-cloning-and-hyper-parameter-tuning) to abstract away internals of the index type and settings used, like:
```cpp
Index* index = index_factory(256, "HNSW32");
ParameterSpace().set_index_parameters(index, "efConstruction=200,efSearch=150");
```

Now if a user wants to perform a filtered search on this _opaque_ index using:
```cpp
SearchParameters parameters;
parameters.sel = new IDSelectorRange(10, 20);
index->search(nq, xq, k, d, id, &parameters);
```

they are met with an error:
```
faiss/IndexHNSW.cpp:251: Error: '!(params)' failed: params type invalid
```

An easy way to reproduce this issue is to replace `Flat` -> `HNSW` [here](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/c_api/example_c.c#L60) and run `example_c` like:
```
make -C build example_c
./build/c_api/example_c
```

This PR allows passing a plain `SearchParameters` to HNSW indexes, and use index settings as a fallback


Reviewed By: asadoughi

Differential Revision: D69266072

Pulled By: mnorris11
@kaivalnp
Copy link
Author

kaivalnp commented Feb 7, 2025

Hi @mnorris11, I see you created a new PR (#4174) to validate that these changes pass the internal Meta CI after updating a failing test -- but that PR is closed..
If that test is passing now (and these changes are acceptable), can this PR be merged?

@mnorris11
Copy link

@kaivalnp We will merge this one to Faiss. I am trying to merge the internal CI fix separately currently.

@kaivalnp
Copy link
Author

kaivalnp commented Feb 7, 2025

@mnorris11 Got it, thanks for the update!

@facebook-github-bot
Copy link
Contributor

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mnorris11 merged this pull request in 1d8f393.

@kaivalnp kaivalnp deleted the handle-sp branch February 7, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants