|
| 1 | +# Image search |
| 2 | + |
| 3 | +**Image search** helps you find similar images by comparing their visual content, not just text or metadata. This feature is useful for e-commerce, content moderation, digital asset management, and any scenario where you need to search for or deduplicate images based on appearance. |
| 4 | + |
| 5 | +TiDB enables image search using **vector search**. With automatic embedding, you can generate image embeddings from image URLs, PIL images, or keyword text using a multimodal embedding model. TiDB then efficiently searches for similar vectors at scale. |
| 6 | + |
| 7 | +!!! tip |
| 8 | + |
| 9 | + For a complete example of image search, see the [Pet image search demo](../examples/image-search-with-pytidb.md). |
| 10 | + |
| 11 | +## Basic usage |
| 12 | + |
| 13 | +### Step 1. Define an embedding function |
| 14 | + |
| 15 | +To generate image embeddings, you need an embedding model that supports image input. |
| 16 | + |
| 17 | +For demonstration, you can use Jina AI's multimodal embedding model to generate image embeddings. |
| 18 | + |
| 19 | +Go to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows: |
| 20 | + |
| 21 | +```python |
| 22 | +from pytidb.embeddings import EmbeddingFunction |
| 23 | + |
| 24 | +image_embed = EmbeddingFunction( |
| 25 | + # Or another provider/model that supports multimodal input |
| 26 | + model_name="jina_ai/jina-embedding-v4", |
| 27 | + api_key="{your-jina-api-key}", |
| 28 | +) |
| 29 | +``` |
| 30 | + |
| 31 | +### Step 2. Create a table and vector field |
| 32 | + |
| 33 | +Use `VectorField()` to define a vector field for storing image embeddings. Set the `source_field` parameter to specify the field that stores image URLs. |
| 34 | + |
| 35 | +```python |
| 36 | +from pytidb.schema import TableModel, Field |
| 37 | + |
| 38 | +class ImageItem(TableModel): |
| 39 | + __tablename__ = "image_items" |
| 40 | + id: int = Field(primary_key=True) |
| 41 | + image_uri: str = Field() |
| 42 | + image_vec: list[float] = image_embed.VectorField( |
| 43 | + source_field="image_uri" |
| 44 | + ) |
| 45 | + |
| 46 | +table = client.create_table(schema=ImageItem, mode="overwrite") |
| 47 | +``` |
| 48 | + |
| 49 | +### Step 3. Insert image data |
| 50 | + |
| 51 | +When you insert data, the `image_vec` field is automatically populated with the embedding generated from the `image_uri`. |
| 52 | + |
| 53 | +```python |
| 54 | +table.bulk_insert([ |
| 55 | + ImageItem(image_uri="https://example.com/image1.jpg"), |
| 56 | + ImageItem(image_uri="https://example.com/image2.jpg"), |
| 57 | + ImageItem(image_uri="https://example.com/image3.jpg"), |
| 58 | +]) |
| 59 | +``` |
| 60 | + |
| 61 | +### Step 4. Perform image search |
| 62 | + |
| 63 | +Image search is a type of vector search. Automatic embedding lets you input an image URL, PIL image, or keyword text directly. All these inputs are converted to vector embeddings for similarity matching. |
| 64 | + |
| 65 | +#### Option 1: Search by image URL |
| 66 | + |
| 67 | +Search for similar images by providing an image URL: |
| 68 | + |
| 69 | +```python |
| 70 | +results = table.search("https://example.com/query.jpg").limit(3).to_list() |
| 71 | +``` |
| 72 | + |
| 73 | +The client converts the input image URL into a vector. TiDB then finds and returns the most similar images by comparing their vectors. |
| 74 | + |
| 75 | +#### Option 2: Search by PIL image |
| 76 | + |
| 77 | +You can also search for similar images by providing an image file or bytes: |
| 78 | + |
| 79 | +```python |
| 80 | +from PIL import Image |
| 81 | + |
| 82 | +image = Image.open("/path/to/query.jpg") |
| 83 | + |
| 84 | +results = table.search(image).limit(3).to_list() |
| 85 | +``` |
| 86 | + |
| 87 | +The client converts the PIL image object into a Base64 string before sending it to the embedding model. |
| 88 | + |
| 89 | +#### Option 3: Search by keyword text |
| 90 | + |
| 91 | +You can also search for similar images by providing keyword text. |
| 92 | + |
| 93 | +For example, if you are working on a pet image dataset, you can search for similar images by keywords like "orange tabby cat" or "golden retriever puppy". |
| 94 | + |
| 95 | +```python |
| 96 | +results = table.search("orange tabby cat").limit(3).to_list() |
| 97 | +``` |
| 98 | + |
| 99 | +The keyword text will be converted to a vector embedding that captures the semantic meaning by the multimodal embedding model, and then a vector search will be performed to find the images whose embeddings are most similar to the keyword embedding. |
| 100 | + |
| 101 | +## See also |
| 102 | + |
| 103 | +- [Automatic embedding guide](./auto-embedding.md) |
| 104 | +- [Vector search guide](../concepts/vector-search.md) |
| 105 | +- [Pet image search demo](../examples/image-search-with-pytidb.md) |
0 commit comments