-
Notifications
You must be signed in to change notification settings - Fork 141
Bring back the fused graph index #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… format version to 6 because of new ordering of fused features.
… FusedADC to FusedPQ for clarity. Improve function signature of OnDiskGraphIndex.View.getPackedNeighbors
… additional copy of neighbors array between OnDiskGraphIndex.View and FusedADCPQDecoder.
# Conflicts: # jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndexBuilder.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/ImmutableGraphIndex.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/OnHeapGraphIndex.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndexWriter.java # jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskSequentialGraphIndexWriter.java # jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java # jvector-tests/src/test/java/io/github/jbellis/jvector/TestUtil.java # jvector-tests/src/test/java/io/github/jbellis/jvector/quantization/TestADCGraphIndex.java
…s in testRecallOnGraphWithRandomVectors
…verything works when using the native backend on machines with AVX512.
…12 so that everything works when using the native backend on machines with AVX512.
…a its own binary selector now.
michaeljmarshall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am posting a partial review with some relatively minor suggestions. I'll revisit later today or tomorrow.
jvector-base/src/main/java/io/github/jbellis/jvector/graph/ImmutableGraphIndex.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/feature/FeatureId.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndex.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndex.java
Show resolved
Hide resolved
jvector-examples/src/main/java/io/github/jbellis/jvector/example/Bench.java
Show resolved
Hide resolved
…o that ramBytesUsed can be computed.
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/Header.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndex.java
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndex.java
Show resolved
Hide resolved
michaeljmarshall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as a maintainer of a consuming application. I didn't analyze every line of the PR, but I did perform downstream tests using fused adc in CC and things appear to work as expected.
jshook
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marianotepper It would be good to see the level of coverage of new/changed code here. Many of the changes are absolutely dependent on numerical and functional unit tests. It's non-trivial to see this overlap here.
tlwillke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One comment added about a possible omission in README.md of NVQ as a second pass option. Also would be great to see some empirical data on the memory savings. The other performance data is thorough enough.
Thanks for this major contribution!
|
@tlwillke I did some measurements for ada002-100k, using 192 PQ segments. The dataset contains 99562 vectors. According the ramBytesUsed estimate in PQVectors, they should take 19.74 MB. Roughly speaking, this matches the count 192 * 99562 / (1024 * 1024) = 18.23 MB. I used: to measure the memory used before and after loading the PQVectors in memory, which the fused graph avoids. According to this method, the memory used before loading the PQvectors is 672.02 MB and 697.73 MB after. Thus, the PQVectors occupy 25.71MB. This is larger than 19.74, so either the estimate is a bit optimistic or the garbage collection did not actually collect everything. When running the fused graph index, the memory consumption pre and post loading is flat, as there is no loading. Happy to incorporate these these changes in Grid. They have performance upsides and downsides in the specific setting of grid, where we are running a matrix of configurations, so that efficiency may be different than when running a single configuration. |
This PR does extensive work to bring back the Fused Graph Index (FGI). In a non-fused graph, the PQ codebook of each vector in the index is stored in memory. The memory complexity is the linear in the number of vectors. FGI reduces significantly the amount of heap memory used during search by offloading the PQ codebooks to storage. These PQ codebooks are packed and stored in-line with the graph, to avoid runtime overheads resulting from this offload.
The memory complexity has two cases now:
These savings come with a very moderate slowdown (reduction in throughput and increase in latency) of about 15%. See the results below for an example.
In this version (and in past versions), FGI only works with PQ through the FUSED_PQ feature. This feature used to be called FUSED_ADC, but to highlight the link with PQ, it has been renamed.
The routine for expanding a node (gathering its out-neighbors and computing their similarities to the query), has been pushed down to the GraphIndex views. This enables having slightly different algorithms depending on the graph layout that may be a little bit more efficient than if abstracted away in the GraphSearcher.
This PR refactors the use of SIMD instructions by FUSED PQ:
These SIMD changes have opened the possibility of deprecating the native vector util backend. Not effecting this deprecation in this PR because there might be another considerations to keep it around.
Edits:
Experimental results:
Dataset: ada002-100k
Configuration:
M : 32
usePruning : true
neighborOverflow : 1.2
addHierarchy : true
efConstruction : 100
Results with topK=10
With a non-fused graph:
With a fused graph:
With the fused graph, the number of queries per second (QPS) is slowed down by less than 15% with an average of 14% and the latency by less than 17% with an average of 15%.
Results with topK=100
With a non-fused graph:
With a fused graph:
With the fused graph, the number of queries per second (QPS) is slowed down by 19% and 16% (overquery=1 and 2, respectively) with an average and the latency by 13% and 8% (overquery=1 and 2, respectively).
Experimental results on larger datasets
In the plots below, QPS, latency, and recall are stable (there's run-to-run variability that is intrinsic to the benchmark). Index construction time increased a bit by the process of fusing the graph on disk, which involves multiple random memory accesses for each node, and writing more to disk.