Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
36bd05d
feat(milvus2): add milvus2 indexer and retriever components
kaijchen Dec 24, 2025
d2dbb66
add deprecated warning in old milvus
kaijchen Dec 25, 2025
0a46d62
fix spell check
kaijchen Dec 25, 2025
7b4ecab
support BM25 function
kaijchen Dec 29, 2025
d5409e3
Revert "add deprecated warning in old milvus"
kaijchen Dec 29, 2025
f665fe5
update 2.4 readme
kaijchen Dec 29, 2025
a69c2b8
Merge branch 'main' into milvus2
kaijchen Jan 5, 2026
4b61f32
Merge branch 'main' into milvus2
hi-pender Jan 9, 2026
94a08a7
update indexer README
kaijchen Jan 12, 2026
19a1e42
document analyzer options
kaijchen Jan 12, 2026
0986f30
add hybrid zh examples & rename bm25 example to hybrid
kaijchen Jan 12, 2026
a47b042
fix old milvus readme
kaijchen Jan 12, 2026
9228dac
cleanup the index error checking logic
kaijchen Jan 12, 2026
1ca3d23
separate sparse config and support BYOSV
kaijchen Jan 13, 2026
01bd1b7
refactor function structure
kaijchen Jan 13, 2026
fb08b74
refactor sparse method naming and doc converter
kaijchen Jan 13, 2026
9946d4c
refactor sparse indexbuilder
kaijchen Jan 13, 2026
7c0018f
update README
kaijchen Jan 13, 2026
b46c0b7
update README
kaijchen Jan 13, 2026
f7f2120
fix example
kaijchen Jan 13, 2026
88b93c8
fix ConsistencyLevel
kaijchen Jan 13, 2026
38f5b07
update ConsistencyLevel defaults
kaijchen Jan 13, 2026
f3e9c0e
add sparse search mode
kaijchen Jan 13, 2026
78b2275
add SparseVectorField in retriever
kaijchen Jan 13, 2026
93c3ad9
set vector fields explicitly in examples
kaijchen Jan 13, 2026
d54058e
update examples to be more explicit
kaijchen Jan 13, 2026
0d83790
refactor: split dense vector configs
kaijchen Jan 13, 2026
29a5471
add sparse example
kaijchen Jan 13, 2026
ce2bc70
update comments
kaijchen Jan 13, 2026
493f698
update readme
kaijchen Jan 13, 2026
57e6d30
simplify store options
kaijchen Jan 13, 2026
6a48f1a
fix readme
kaijchen Jan 13, 2026
e69b8c5
standardize score retrieval in examples
kaijchen Jan 13, 2026
7e997ad
remove synchronous flush in indexer
kaijchen Jan 13, 2026
f081614
don't store score in metadata
kaijchen Jan 13, 2026
522a83f
remove filtering in retriever
kaijchen Jan 14, 2026
cda70c5
hybrid search defaults to global TopK
kaijchen Jan 14, 2026
692f98b
set output field default
kaijchen Jan 14, 2026
515ee5e
use upsert for store
kaijchen Jan 14, 2026
d80db2c
add GPUIVFPQIndexBuilder
kaijchen Jan 14, 2026
5b1cf59
document sparse method defaults
kaijchen Jan 14, 2026
5f849e2
check index exist before creation
kaijchen Jan 14, 2026
29b2486
refactor search mode for polymorphism
kaijchen Jan 14, 2026
f80a7e2
update comments in examples
kaijchen Jan 14, 2026
9cc7cf8
fix example
kaijchen Jan 14, 2026
70d57f5
use DocumentConverter in scalar search mode
kaijchen Jan 14, 2026
4c882f9
fix metric type in range search
kaijchen Jan 14, 2026
cdc9ef1
improve iterator EOF check
kaijchen Jan 14, 2026
c938329
cleanup
kaijchen Jan 14, 2026
831063f
cleanup
kaijchen Jan 14, 2026
e929417
validate hybrid search option
kaijchen Jan 14, 2026
fdd2f71
update README
kaijchen Jan 14, 2026
e3f4ab9
combine and fix hybrid test
kaijchen Jan 14, 2026
7ecb7aa
improve coverage
kaijchen Jan 14, 2026
a75ad61
Merge branch 'main' into milvus2
kaijchen Jan 14, 2026
293f730
move EmbedQuery to search_mode/utils
kaijchen Jan 14, 2026
a356579
update README
kaijchen Jan 14, 2026
cb409a5
update indexer README
kaijchen Jan 15, 2026
6644f5a
Merge branch 'main' into milvus2
hi-pender Jan 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ ot = "ot"
OT = "OT"
typ = "typ"
Typ = "Typ"
Rabit = "Rabit"

[files]
extend-exclude = ["**/*.test.txt"]
2 changes: 2 additions & 0 deletions components/indexer/milvus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ An Milvus 2.x indexer implementation for [Eino](https://github.com/cloudwego/ein
interface. This enables seamless integration
with Eino's vector storage and retrieval system for enhanced semantic search capabilities.

> **Note**: This package supports **Milvus 2.4.x**. For Milvus 2.5+ features (BM25, server-side functions, hybrid search), use the [`milvus2`](../milvus2) package instead.
## Quick Start

### Installation
Expand Down
2 changes: 2 additions & 0 deletions components/indexer/milvus/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
基于 Milvus 2.x 的向量存储实现,为 [Eino](https://github.com/cloudwego/eino) 提供了符合 `Indexer` 接口的存储方案。该组件可无缝集成
Eino 的向量存储和检索系统,增强语义搜索能力。

> **注意**: 此包支持 **Milvus 2.4.x**。如需使用 Milvus 2.5+ 的新功能(BM25、服务端函数、混合检索),请使用 [`milvus2`](../milvus2) 包。
## 快速开始

### 安装
Expand Down
361 changes: 361 additions & 0 deletions components/indexer/milvus2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# Milvus 2.x Indexer

English | [中文](./README_zh.md)

This package provides a Milvus 2.x (V2 SDK) indexer implementation for the EINO framework. It enables document storage and vector indexing in Milvus.

> **Note**: This package requires **Milvus 2.5+** for server-side function support (e.g., BM25).

## Features

- **Milvus V2 SDK**: Uses the latest `milvus-io/milvus/client/v2` SDK
- **Flexible Index Types**: Supports comprehensive index types including Auto, HNSW, IVF variants, SCANN, DiskANN, GPU indexes, and RaBitQ (Milvus 2.6+)
- **Hybrid Search Ready**: Native support for Sparse Vectors (BM25/SPLADE) alongside Dense Vectors
- **Service-side Vector Generation**: Automatically generate sparse vectors using Milvus Functions (BM25)
- **Auto Management**: Handles collection schema creation, index building, and loading automatically
- **Field Analysis**: Configurable text analyzers (English, Chinese, Standard, etc.)
- **Custom Document Conversion**: Flexible mapping from Eino documents to Milvus columns

## Installation

```bash
go get github.com/cloudwego/eino-ext/components/indexer/milvus2
```

## Quick Start

```go
package main

import (
"context"
"log"
"os"

"github.com/cloudwego/eino-ext/components/embedding/ark"
"github.com/cloudwego/eino/schema"
"github.com/milvus-io/milvus/client/v2/milvusclient"

milvus2 "github.com/cloudwego/eino-ext/components/indexer/milvus2"
)

func main() {
// Get the environment variables
addr := os.Getenv("MILVUS_ADDR")
username := os.Getenv("MILVUS_USERNAME")
password := os.Getenv("MILVUS_PASSWORD")
arkApiKey := os.Getenv("ARK_API_KEY")
arkModel := os.Getenv("ARK_MODEL")

ctx := context.Background()

// Create an embedding model
emb, err := ark.NewEmbedder(ctx, &ark.EmbeddingConfig{
APIKey: arkApiKey,
Model: arkModel,
})
if err != nil {
log.Fatalf("Failed to create embedding: %v", err)
return
}

// Create an indexer
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
ClientConfig: &milvusclient.ClientConfig{
Address: addr,
Username: username,
Password: password,
},
Collection: "my_collection",

Vector: &milvus2.VectorConfig{
Dimension: 1024, // Match your embedding model dimension
MetricType: milvus2.COSINE,
IndexBuilder: milvus2.NewHNSWIndexBuilder().WithM(16).WithEfConstruction(200),
},
Embedding: emb,
})
if err != nil {
log.Fatalf("Failed to create indexer: %v", err)
return
}
log.Printf("Indexer created successfully")

// Store documents
docs := []*schema.Document{
{
ID: "doc1",
Content: "Milvus is an open-source vector database",
MetaData: map[string]any{
"category": "database",
"year": 2021,
},
},
{
ID: "doc2",
Content: "EINO is a framework for building AI applications",
},
}
ids, err := indexer.Store(ctx, docs)
if err != nil {
log.Fatalf("Failed to store: %v", err)
return
}
log.Printf("Store success, ids: %v", ids)
}
```

## Configuration

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `Client` | `*milvusclient.Client` | - | Pre-configured Milvus client (optional) |
| `ClientConfig` | `*milvusclient.ClientConfig` | - | Client configuration (required if Client is nil) |
| `Collection` | `string` | `"eino_collection"` | Collection name |
| `Vector` | `*VectorConfig` | - | Dense vector configuration (Dimension, MetricType, IndexBuilder) |
| `Sparse` | `*SparseVectorConfig` | - | Sparse vector configuration (MetricType, FieldName) |
| `Embedding` | `embedding.Embedder` | - | Embedder for vectorization (optional). If nil, documents must have vectors (BYOV). |
| `DocumentConverter` | `func` | default converter | Custom document to Milvus column converter |
| `ConsistencyLevel` | `ConsistencyLevel` | `ConsistencyLevelDefault` | Consistency level (`ConsistencyLevelDefault` uses Milvus default: Bounded; stays at collection level if not explicitly set) |
| `PartitionName` | `string` | - | Default partition for insertion |
| `EnableDynamicSchema` | `bool` | `false` | Enable dynamic field support |
| `Functions` | `[]*entity.Function` | - | Schema functions (e.g., BM25) for server-side processing |
| `FieldParams` | `map[string]map[string]string` | - | Parameters for fields (e.g., enable_analyzer) |

### Vector Configuration (`VectorConfig`)

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `Dimension` | `int64` | - | Vector dimension (Required) |
| `MetricType` | `MetricType` | `L2` | Similarity metric (L2, IP, COSINE, etc.) |
| `IndexBuilder` | `IndexBuilder` | `AutoIndexBuilder` | Index type builder (HNSW, IVF, etc.) |
| `VectorField` | `string` | `"vector"` | Field name for dense vector |

### Sparse Vector Configuration (`SparseVectorConfig`)

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `VectorField` | `string` | `"sparse_vector"` | Field name for sparse vector |
| `MetricType` | `MetricType` | `BM25` | Similarity metric |
| `Method` | `SparseMethod` | `SparseMethodAuto` | Generation method (`SparseMethodAuto` or `SparseMethodPrecomputed`) |
| `IndexBuilder` | `SparseIndexBuilder` | `SparseInvertedIndex` | Index builder (`NewSparseInvertedIndexBuilder` or `NewSparseWANDIndexBuilder`) |

> **Note**: `Method` defaults to `Auto` only if `MetricType` is `BM25`. `Auto` implies using Milvus server-side functions (remote function). For other metrics (e.g., `IP`), it defaults to `Precomputed`.

## Index Builders

### Dense Index Builders

| Builder | Description | Key Parameters |
|---------|-------------|----------------|
| `NewAutoIndexBuilder()` | Milvus auto-selects optimal index | - |
| `NewHNSWIndexBuilder()` | Graph-based with excellent performance | `M`, `EfConstruction` |
| `NewIVFFlatIndexBuilder()` | Cluster-based search | `NList` |
| `NewIVFPQIndexBuilder()` | Memory-efficient with product quantization | `NList`, `M`, `NBits` |
| `NewIVFSQ8IndexBuilder()` | Scalar quantization | `NList` |
| `NewIVFRabitQIndexBuilder()` | IVF + RaBitQ binary quantization (Milvus 2.6+) | `NList` |
| `NewFlatIndexBuilder()` | Brute-force exact search | - |
| `NewDiskANNIndexBuilder()` | Disk-based for large datasets | - |
| `NewSCANNIndexBuilder()` | Fast with high recall | `NList`, `WithRawDataEnabled` |
| `NewBinFlatIndexBuilder()` | Brute-force for binary vectors | - |
| `NewBinIVFFlatIndexBuilder()` | Cluster-based for binary vectors | `NList` |
| `NewGPUBruteForceIndexBuilder()` | GPU-accelerated brute-force | - |
| `NewGPUIVFFlatIndexBuilder()` | GPU-accelerated IVF_FLAT | - |
| `NewGPUIVFPQIndexBuilder()` | GPU-accelerated IVF_PQ | - |
| `NewGPUCagraIndexBuilder()` | GPU-accelerated graph-based (CAGRA) | `IntermediateGraphDegree`, `GraphDegree` |

### Sparse Index Builders

| Builder | Description | Key Parameters |
|---------|-------------|----------------|
| `NewSparseInvertedIndexBuilder()` | Inverted index for sparse vectors | `DropRatioBuild` |
| `NewSparseWANDIndexBuilder()` | WAND algorithm for sparse vectors | `DropRatioBuild` |

### Example: HNSW Index

```go
indexBuilder := milvus2.NewHNSWIndexBuilder().
WithM(16). // Max connections per node (4-64)
WithEfConstruction(200) // Index build search width (8-512)
```

### Example: IVF_FLAT Index

```go
indexBuilder := milvus2.NewIVFFlatIndexBuilder().
WithNList(256) // Number of cluster units (1-65536)
```

### Example: IVF_PQ Index (Memory-efficient)

```go
indexBuilder := milvus2.NewIVFPQIndexBuilder().
WithNList(256). // Number of cluster units
WithM(16). // Number of subquantizers
WithNBits(8) // Bits per subquantizer (1-16)
```

### Example: SCANN Index (Fast with high recall)

```go
indexBuilder := milvus2.NewSCANNIndexBuilder().
WithNList(256). // Number of cluster units
WithRawDataEnabled(true) // Enable raw data for reranking
```

### Example: DiskANN Index (Large datasets)

```go
indexBuilder := milvus2.NewDiskANNIndexBuilder() // Disk-based, no extra params
```

### Example: Sparse Inverted Index

```go
indexBuilder := milvus2.NewSparseInvertedIndexBuilder().
WithDropRatioBuild(0.2) // Drop ratio for small values (0.0-1.0)
```

### Dense Vector Metrics
| Metric | Description |
|--------|-------------|
| `L2` | Euclidean distance |
| `IP` | Inner Product |
| `COSINE` | Cosine similarity |

### Sparse Vector Metrics
| Metric | Description |
|--------|-------------|
| `BM25` | Okapi BM25 (Required for `SparseMethodAuto`) |
| `IP` | Inner Product (Suitable for precomputed sparse vectors) |

### Binary Vector Metrics
| Metric | Description |
|--------|-------------|
| `HAMMING` | Hamming distance |
| `JACCARD` | Jaccard distance |
| `TANIMOTO` | Tanimoto distance |
| `SUBSTRUCTURE` | Substructure search |
| `SUPERSTRUCTURE` | Superstructure search |

## Sparse Vector Support

The indexer supports two modes for sparse vectors: **Auto-Generation** and **Precomputed**.

### 1. Auto-Generation (BM25)

Uses Milvus server-side functions to automatically generate sparse vectors from the content field.

- **Requirement**: Milvus 2.5+
- **Configuration**: Set `MetricType: milvus2.BM25`.

```go
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
// ... basic config ...
Collection: "hybrid_collection",

Sparse: &milvus2.SparseVectorConfig{
VectorField: "sparse_vector",
MetricType: milvus2.BM25,
// Method defaults to SparseMethodAuto for BM25
},

// Analyzer configuration for BM25
FieldParams: map[string]map[string]string{
"content": {
"enable_analyzer": "true",
"analyzer_params": `{"type": "standard"}`,
},
},
})
```

### 2. Precomputed (SPLADE, BGE-M3, etc.)

Allows storing sparse vectors generated by external models (e.g., SPLADE, BGE-M3) or custom logic.

- **Configuration**: Set `MetricType` (usually `IP`) and `Method: milvus2.SparseMethodPrecomputed`.
- **Usage**: Provide sparse vectors via `doc.WithSparseVector()`.

```go
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
Collection: "sparse_collection",

Sparse: &milvus2.SparseVectorConfig{
VectorField: "sparse_vector",
MetricType: milvus2.IP,
Method: milvus2.SparseMethodPrecomputed,
},
})

// Store documents with sparse vectors
doc := &schema.Document{ID: "1", Content: "..."}
doc.WithSparseVector(map[int]float64{
1024: 0.5,
2048: 0.3,
})
indexer.Store(ctx, []*schema.Document{doc})
```

## Bring Your Own Vectors (BYOV)

You can use the indexer without an embedder if your documents already have vectors.

```go
// Create indexer without embedding
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
ClientConfig: &milvusclient.ClientConfig{
Address: "localhost:19530",
},
Collection: "my_collection",
Vector: &milvus2.VectorConfig{
Dimension: 128,
MetricType: milvus2.L2,
},
// Embedding: nil, // Leave nil
})

// Store documents with pre-computed vectors
docs := []*schema.Document{
{
ID: "doc1",
Content: "Document with existing vector",
},
}

// Attach dense vector to document
// Vector dimension must match the collection dimension
vector := []float64{0.1, 0.2, ...}
docs[0].WithDenseVector(vector)

// Attach sparse vector (optional, if Sparse is configured)
// Sparse vectors are maps of index -> weight
sparseVector := map[int]float64{
10: 0.5,
25: 0.8,
}
docs[0].WithSparseVector(sparseVector)

ids, err := indexer.Store(ctx, docs)
```

For sparse vectors in BYOV mode, configured the sparse vector as **Precomputed** (see above).

## Examples

See the [examples](./examples) directory for complete working examples:

- [demo](./examples/demo) - Basic collection setup with HNSW index
- [hnsw](./examples/hnsw) - HNSW index example
- [ivf_flat](./examples/ivf_flat) - IVF_FLAT index example
- [rabitq](./examples/rabitq) - IVF_RABITQ index example (Milvus 2.6+)
- [auto](./examples/auto) - AutoIndex example
- [diskann](./examples/diskann) - DISKANN index example
- [hybrid](./examples/hybrid) - Hybrid search setup (Dense + BM25 sparse) (Milvus 2.5+)
- [hybrid_chinese](./examples/hybrid_chinese) - Hybrid search with Chinese analyzer (Milvus 2.5+)
- [sparse](./examples/sparse) - Sparse-only index example (BM25)
- [byov](./examples/byov) - Bring Your Own Vectors example

## License

Apache License 2.0
Loading
Loading