RFC-104 - Add native Vector Search Index type for Hudi Vector Type #18500
chrevanthreddy
started this conversation in
New Feature Ideas
Replies: 2 comments 1 reply
-
|
@chrevanthreddy Thanks for starting this, will take a look soon! |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
@chrevanthreddy Can you please open a DRAFT PR in the Hudi repo from the branch and reference here.. It ll be easier to get more community eyes here directly. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This RFC proposes native vector similarity search support in Apache Hudi, enabling approximate nearest neighbor (ANN) queries on embedding columns stored in Hudi tables. The design extends the new VECTOR type in the main data table. A lightweight cluster-routing index in the Hudi metadata table provides file-group pruning, while future hidden columns in the main Parquet files may store RaBitQ binary codes and scalars for fast within-file ANN scanning. Cluster assignment itself stays in MDT.
https://github.com/chrevanthreddy/hudi/blob/f67ef0972f9a8a2928a92cf8b61ad2a81ab3cd72/rfc/rfc-104/rfc-104.md
https://github.com/chrevanthreddy/hudi/pull/1/changes#diff-e8be007ae70221ad49cd11b305f4dbe6d1d09b344549a877555a7eb376d16f2d
Starting off the discussion for Vector Search Index. Numbers regarding the initial testing can be added here to verify effectiveness of adding the index. And also cost of the adding index and maintaining vs brute force vector search
Beta Was this translation helpful? Give feedback.
All reactions