-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Vector Index Build] Introduce RemoteNativeIndexBuildStrategy skeleton #2525
base: main
Are you sure you want to change the base?
Conversation
25981af
to
8040bae
Compare
...main/java/org/opensearch/knn/index/codec/KNN10010Codec/KNN10010PerFieldKnnVectorsFormat.java
Outdated
Show resolved
Hide resolved
...in/java/org/opensearch/knn/index/codec/KNN10010Codec/NativeEngines10010KnnVectorsWriter.java
Outdated
Show resolved
Hide resolved
3f39eba
to
739d2ef
Compare
Thanks @navneet1v , I also agree that a new writer isn't completely necessary as the underlying formats are not changing. Moreover, the remote build should be format agnostic anyways, so I've refactored |
739d2ef
to
38cb684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall @jed326 .
src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java
Outdated
Show resolved
Hide resolved
38cb684
to
105ae13
Compare
src/main/java/org/opensearch/knn/common/featureflags/KNNFeatureFlags.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/RemoteNativeIndexWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java
Outdated
Show resolved
Hide resolved
...a/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriterFlushTests.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/common/featureflags/KNNFeatureFlags.java
Show resolved
Hide resolved
e7f50da
to
4f8648c
Compare
src/main/java/org/opensearch/knn/index/remote/RemoteIndexBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/RemoteNativeIndexWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/RemoteNativeIndexWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java
Outdated
Show resolved
Hide resolved
...a/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriterFlushTests.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/LocalNativeIndexWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/RemoteNativeIndexWriter.java
Outdated
Show resolved
Hide resolved
a154f4d
to
4f8648c
Compare
0531425
to
7e93b7f
Compare
@shatejas I've reworked this skeleton in the form of |
7964781
to
0c52e14
Compare
} | ||
} catch (Exception e) { | ||
throw new RuntimeException(e); | ||
} | ||
}); | ||
final Long expectedTimesGetVectorValuesIsCalled = vectorsPerField.stream().filter(Predicate.not(Map::isEmpty)).count(); | ||
knnVectorValuesFactoryMockedStatic.verify( | ||
() -> KNNVectorValuesFactory.getVectorValues(any(VectorDataType.class), any(DocsWithFieldSet.class), any()), | ||
times(Math.toIntExact(expectedTimesGetVectorValuesIsCalled) * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In quantization case getVectorValues
is called twice, however now we are retrieving the supplier itself via getVectorValuesSupplier
, which we only need to do once then we are passing the supplier through the rest of the flow.
src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Show resolved
Hide resolved
0c52e14
to
99122fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shatejas already had made some good suggestions on PR , I see those being addressed. I am good with approach and class hierarchy.
99122fc
to
cec2c22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we hide repositories service dependency inside some RemoteIndexBuilder abstraction instead of taking direct dependency on it in codec classes?
Also, I think this should go into a feature branch, especially now that main is a release branch. I think overall structure is good, but its still WIP and doesnt have tests.
src/main/java/org/opensearch/knn/index/codec/nativeindex/model/BuildIndexParams.java
Show resolved
Hide resolved
@@ -54,15 +55,29 @@ public class NativeEngines990KnnVectorsWriter extends KnnVectorsWriter { | |||
private final List<NativeEngineFieldVectorsWriter<?>> fields = new ArrayList<>(); | |||
private boolean finished; | |||
private final Integer approximateThreshold; | |||
private final Supplier<RepositoriesService> repositoriesServiceSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of taking a hard dependency on the RepositoriesService, can we build an abstraction, like RemoteIndexBuilder, that hides these details from the IndexWriter class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach I previously considered was in NativeEngines990KnnVectorsFormat
itself we can do the feature checks and then create an instance of RemoteIndexBuildStrategy
to pass to the NativeEngines990KnnVectorsWriter
. However, it seemed better to me to not instantiate any of the remote index build related classes unless they were actually needed, so I went with this approach of passing the repositoriesServiceSupplier
through to the index writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced a NativeIndexBuildStrategyFactory
class in the latest revision
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Show resolved
Hide resolved
Thanks @jmazanec15
I do have a feature branch in my fork that I have been developing on (https://github.com/jed326/k-NN/commits/remote-vector-staging-2.19/), but I strongly believe we need to start merging these changes into In terms of testing I will add some coverage to ensure the fallback mechanism is working. I was thinking these weren't needed for now since all the functional testing would come in subsequent PRs that include the functionality itself. |
cec2c22
to
e3e057b
Compare
@navneet1v @shatejas @Vikasht34 What do you guys think on main vs. feature branch? |
Feature Branch |
if tests are there and code is buildable then we should go with main branch. There is no point in feature branch. |
13b309b
to
c4a71b6
Compare
I would strongly prefer to not use a feature branch. I think this PR itself is a prime example of why it's important to get a wide range of opinions on changes like this that include a decent amount of refactoring and I think this PR would not have gotten the same amount of feedback if it were directed at a feature branch. For testing, in the latest revision I have added some randomization to the base test class to randomly enable/disable the new settings and feature flag to ensure the fallback mechanisms are working correctly. |
Okay, lets add some tests and we can develop on main, assuming proper feature flag. |
c4a71b6
to
0ff2432
Compare
… to accept vector value supplier Signed-off-by: Jay Deng <[email protected]>
0ff2432
to
b0aa2b4
Compare
Added a new |
Description
First PR for #2465
In order to review changes incrementally, this PR is scoped down to only the following:
I am keeping the vector upload changes in a separate follow-up PR as that will deserve it's own in-depth discussion.
The key part of this PR is that we need to pass a
Supplier
forKNNVectorValues
through to theNativeIndexBuildStrategy
as we need to open up multipleInputStream
s on multipleKNNVectorValues
in order to support uploading blobs in parallel. The actual implementation of this parallel upload will come in the next PR, however I am laying the groundwork for that in this skeleton implementation.Related Issues
Relates: #2465
Check List
- [ ] New functionality has been documented.- [ ] API changes companion pull request created.--signoff
.- [ ] Public documentation issue/PR created.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.