Skip to content

Commit 364fe76

Browse files
authored
docs: add image search docs (#168)
* docs: add image search docs * add desc
1 parent fa09edb commit 364fe76

File tree

6 files changed

+113
-5
lines changed

6 files changed

+113
-5
lines changed

.vscode/settings.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
{
22
"cSpell.words": [
33
"FULLTEXT",
4-
"Pydantic"
4+
"Pydantic",
55
"getenv",
66
"jina",
77
"jinaai",
88
"Rerank",
99
"reranker",
1010
"reranking",
11-
"tablename"
11+
"tablename",
12+
"multimodal"
1213
]
1314
}

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ nav:
9696
- Vector Search: ai/guides/vector-search.md
9797
- Fulltext Search: ai/guides/fulltext-search.md
9898
- Hybrid Search: ai/guides/hybrid-search.md
99+
- Image Search: ai/guides/image-search.md
99100
- Auto Embedding: ai/guides/auto-embedding.md
100101
- Reranking: ai/guides/reranking.md
101102
- Filtering: ai/guides/filtering.md
@@ -126,6 +127,7 @@ nav:
126127
- Vector Search: ai/guides/vector-search.md
127128
- Fulltext Search: ai/guides/fulltext-search.md
128129
- Hybrid Search: ai/guides/hybrid-search.md
130+
- Image Search: ai/guides/image-search.md
129131
- Auto Embedding: ai/guides/auto-embedding.md
130132
- Reranking: ai/guides/reranking.md
131133
- Filtering: ai/guides/filtering.md

src/ai/guides/fulltext-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ TiDB provides full-text search capabilities for **massive datasets** with high p
1515

1616
!!! tip
1717

18-
For complete example code, see the [full-text search example](https://github.com/pingcap/pytidb/blob/main/examples/fulltext_search).
18+
For a complete example of full-text search, see the [E-commerce product search demo](../examples/fulltext-search-with-pytidb.md).
1919

2020
## Basic Usage
2121

src/ai/guides/hybrid-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ TiDB supports both semantic search (also known as vector search) and keyword-bas
1010

1111
!!! tip
1212

13-
For a complete example of hybrid search, refer to the [hybrid-search example](https://github.com/pingcap/pytidb/tree/main/examples/hybrid_search).
13+
For a complete example of hybrid search, refer to the [hybrid-search example](../examples/hybrid-search-with-pytidb.md).
1414

1515

1616
## Basic Usage

src/ai/guides/image-search.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Image search
2+
3+
**Image search** helps you find similar images by comparing their visual content, not just text or metadata. This feature is useful for e-commerce, content moderation, digital asset management, and any scenario where you need to search for or deduplicate images based on appearance.
4+
5+
TiDB enables image search using **vector search**. With automatic embedding, you can generate image embeddings from image URLs, PIL images, or keyword text using a multimodal embedding model. TiDB then efficiently searches for similar vectors at scale.
6+
7+
!!! tip
8+
9+
For a complete example of image search, see the [Pet image search demo](../examples/image-search-with-pytidb.md).
10+
11+
## Basic usage
12+
13+
### Step 1. Define an embedding function
14+
15+
To generate image embeddings, you need an embedding model that supports image input.
16+
17+
For demonstration, you can use Jina AI's multimodal embedding model to generate image embeddings.
18+
19+
Go to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows:
20+
21+
```python
22+
from pytidb.embeddings import EmbeddingFunction
23+
24+
image_embed = EmbeddingFunction(
25+
# Or another provider/model that supports multimodal input
26+
model_name="jina_ai/jina-embedding-v4",
27+
api_key="{your-jina-api-key}",
28+
)
29+
```
30+
31+
### Step 2. Create a table and vector field
32+
33+
Use `VectorField()` to define a vector field for storing image embeddings. Set the `source_field` parameter to specify the field that stores image URLs.
34+
35+
```python
36+
from pytidb.schema import TableModel, Field
37+
38+
class ImageItem(TableModel):
39+
__tablename__ = "image_items"
40+
id: int = Field(primary_key=True)
41+
image_uri: str = Field()
42+
image_vec: list[float] = image_embed.VectorField(
43+
source_field="image_uri"
44+
)
45+
46+
table = client.create_table(schema=ImageItem, mode="overwrite")
47+
```
48+
49+
### Step 3. Insert image data
50+
51+
When you insert data, the `image_vec` field is automatically populated with the embedding generated from the `image_uri`.
52+
53+
```python
54+
table.bulk_insert([
55+
ImageItem(image_uri="https://example.com/image1.jpg"),
56+
ImageItem(image_uri="https://example.com/image2.jpg"),
57+
ImageItem(image_uri="https://example.com/image3.jpg"),
58+
])
59+
```
60+
61+
### Step 4. Perform image search
62+
63+
Image search is a type of vector search. Automatic embedding lets you input an image URL, PIL image, or keyword text directly. All these inputs are converted to vector embeddings for similarity matching.
64+
65+
#### Option 1: Search by image URL
66+
67+
Search for similar images by providing an image URL:
68+
69+
```python
70+
results = table.search("https://example.com/query.jpg").limit(3).to_list()
71+
```
72+
73+
The client converts the input image URL into a vector. TiDB then finds and returns the most similar images by comparing their vectors.
74+
75+
#### Option 2: Search by PIL image
76+
77+
You can also search for similar images by providing an image file or bytes:
78+
79+
```python
80+
from PIL import Image
81+
82+
image = Image.open("/path/to/query.jpg")
83+
84+
results = table.search(image).limit(3).to_list()
85+
```
86+
87+
The client converts the PIL image object into a Base64 string before sending it to the embedding model.
88+
89+
#### Option 3: Search by keyword text
90+
91+
You can also search for similar images by providing keyword text.
92+
93+
For example, if you are working on a pet image dataset, you can search for similar images by keywords like "orange tabby cat" or "golden retriever puppy".
94+
95+
```python
96+
results = table.search("orange tabby cat").limit(3).to_list()
97+
```
98+
99+
The keyword text will be converted to a vector embedding that captures the semantic meaning by the multimodal embedding model, and then a vector search will be performed to find the images whose embeddings are most similar to the keyword embedding.
100+
101+
## See also
102+
103+
- [Automatic embedding guide](./auto-embedding.md)
104+
- [Vector search guide](../concepts/vector-search.md)
105+
- [Pet image search demo](../examples/image-search-with-pytidb.md)

src/ai/guides/vector-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Vector search uses semantic similarity to help you find the most relevant record
44

55
!!! tip
66

7-
For a complete example of vector search, see the [vector-search example](https://github.com/pingcap/pytidb/tree/main/examples/vector_search).
7+
For a complete example of vector search, see the [vector-search example](../examples/vector-search-with-pytidb.md).
88

99

1010
## Basic Usage

0 commit comments

Comments
 (0)