pingcap
diff --git a/‎mkdocs.yml‎
Lines changed: 27 additions & 5 deletions b/‎mkdocs.yml‎
Lines changed: 27 additions & 5 deletions
diff --git a/‎src/ai/guides/auto-embedding.md‎
Lines changed: 3 additions & 61 deletions b/‎src/ai/guides/auto-embedding.md‎
Lines changed: 3 additions & 61 deletions
diff --git a/‎src/ai/guides/image-search.md‎
Lines changed: 2 additions & 1 deletion b/‎src/ai/guides/image-search.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎src/ai/integrations/embedding-cohere.md‎
Lines changed: 178 additions & 0 deletions b/‎src/ai/integrations/embedding-cohere.md‎
Lines changed: 178 additions & 0 deletions
@@ -117,8 +117,20 @@ nav:
       - IDE & Tool Integration:
         - Cursor: ai/integrations/tidb-mcp-cursor.md
         - Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md
-      - LlamaIndex: ai/integrations/llamaindex.md
-      - LangChain: ai/integrations/langchain.md
+      - AI Frameworks:
+        - LlamaIndex: ai/integrations/llamaindex.md
+        - LangChain: ai/integrations/langchain.md
+      - Embeddings:
+        - Overview: ai/integrations/embedding-overview.md
+        - TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md
+        - OpenAI: ai/integrations/embedding-openai.md
+        - OpenAI Compatible: ai/integrations/embedding-openai-compatible.md
+        - Cohere: ai/integrations/embedding-cohere.md
+        - Jina AI: ai/integrations/embedding-jinaai.md
+        - Google Gemini: ai/integrations/embedding-gemini.md
+        - Hugging Face: ai/integrations/embedding-huggingface.md
+        - NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md
+
   - Concepts:
     - Vector Search: ai/concepts/vector-search.md
   - Guides:
@@ -151,9 +163,19 @@ nav:
     - IDE & Tool Integration:
       - Cursor: ai/integrations/tidb-mcp-cursor.md
       - Claude Desktop: ai/integrations/tidb-mcp-claude-desktop.md
-    - LlamaIndex: ai/integrations/llamaindex.md
-    - LangChain: ai/integrations/langchain.md
-
+    - AI Frameworks:
+      - LlamaIndex: ai/integrations/llamaindex.md
+      - LangChain: ai/integrations/langchain.md
+    - Embeddings:
+      - Overview: ai/integrations/embedding-overview.md
+      - TiDB Cloud Hosted: ai/integrations/embedding-tidb-cloud-hosted.md
+      - OpenAI: ai/integrations/embedding-openai.md
+      - OpenAI Compatible: ai/integrations/embedding-openai-compatible.md
+      - Cohere: ai/integrations/embedding-cohere.md
+      - Jina AI: ai/integrations/embedding-jinaai.md
+      - Google Gemini: ai/integrations/embedding-gemini.md
+      - Hugging Face: ai/integrations/embedding-huggingface.md
+      - NVIDIA NIM: ai/integrations/embedding-nvidia-nim.md
 
 extra:
   social:
 
@@ -8,20 +8,19 @@ Auto embedding is a feature that allows you to automatically generate vector emb
 
 ## Basic Usage
 
+In this example, we use TiDB Cloud hosted embedding models for demonstration, for other providers, please check the [Supported Providers](../integrations/embedding-overview.md#supported-providers) list.
+
 ### Step 1. Define a embedding function
 
 === "Python"
 
     Define a embedding function to generate vector embeddings for text data.
-    
-    In this example, we use OpenAI as the embedding provider for demonstration, for other providers, please check the [Supported Providers](#supported-providers) list.
 
     ```python
     from pytidb.embeddings import EmbeddingFunction
 
     embed_func = EmbeddingFunction(
-        model_name="openai/{model_name}",       # openai/text-embedding-3-small
-        api_key="{your-openai-api-key}",
+        model_name="tidbcloud_free/amazon/titan-embed-text-v2",
     )
     ```
 
@@ -74,60 +73,3 @@ Auto embedding is a feature that allows you to automatically generate vector emb
     ```python
     table.search("HTAP database").limit(3).to_list()
     ```
-
-## Embedding Function
-
-`EmbeddingFunction` provides a unified interface in `pytidb` for accessing external embedding model services.
-
-#### Constructor Parameters
-
-- `model_name` *(required)*:  
-  Specifies the embedding model to use, in the format `{provider_name}/{model_name}`.
-
-- `dimensions` *(optional)*:  
-  The dimensionality of the output vector embeddings. If not provided and the selected model does not include a default dimension, a test string will be embedded during initialization to automatically determine the actual dimension.
-
-- `api_key` *(optional)*:  
-  The API key used to access the embedding service. If not explicitly set, the key will be retrieved from the default environment variable associated with the provider.
-
-- `api_base` *(optional)*:  
-  The base URL of the embedding API service.
-
-### Supported Providers
-
-Below is a list of supported embedding model providers. You can follow the corresponding example to create an EmbeddingFunction instance for the provider you are using.
-
-#### OpenAI
-
-For OpenAI users, you can go to [OpenAI API Platform](https://platform.openai.com/api-keys) to create your own API key.
-
-```python
-embed_func = EmbeddingFunction(
-    model_name="openai/{model_name}",       # openai/text-embedding-3-small
-    api_key="{your-openai-api-key}",
-)
-```
-
-#### OpenAI Like
-
-If you're using a platform or tool that is compatible with the OpenAI API format, you can indicate this by adding the `openai/` prefix to the `model_name` parameter. Then, use the `api_base` parameter to specify the base URL of the API provided by your platform or tool.
-
-```python
-embed_func = EmbeddingFunction(
-    model_name="openai/{model_name}",        # text-embedding-3-small 
-    api_key="{your-server-api-key}",
-    api_base="{your-api-server-base-url}"    # http://localhost:11434/
-)
-```
-
-#### Jina AI
-
-For Jina AI users, you can go to [Jina AI website](https://jina.ai/embeddings/) to create your own API key.
-
-```python
-embed_func = EmbeddingFunction(
-    model_name="jina_ai/{model_name}",  # jina_ai/jina-embeddings-v3
-    api_key="{your-jina-api-key}"
-)
-```
-
@@ -18,13 +18,14 @@ For demonstration, you can use Jina AI's multimodal embedding model to generate
 
 Go to [Jina AI](https://jina.ai/embeddings) to create an API key, then initialize the embedding function as follows:
 
-```python
+```python hl_lines="7"
 from pytidb.embeddings import EmbeddingFunction
 
 image_embed = EmbeddingFunction(
     # Or another provider/model that supports multimodal input
     model_name="jina_ai/jina-embedding-v4",
     api_key="{your-jina-api-key}",
+    multimodal=True,
 )
 ```
 
 
@@ -0,0 +1,178 @@
+---
+title: "Integrate TiDB Vector Search with Cohere Embeddings API"
+description: "Learn how to integrate TiDB Vector Search with Cohere Embeddings API to store embeddings and perform semantic search."
+keywords: "TiDB, Cohere, Vector search, text embeddings, multilingual embeddings"
+---
+
+# Integrate TiDB Vector Search with Cohere Embeddings API
+
+This tutorial demonstrates how to use [Cohere](https://cohere.com/embed) to generate text embeddings, store them in TiDB vector storage, and perform semantic search.
+
+!!! info
+
+    Currently, only the following product and regions support native SQL functions for integrating the Cohere Embeddings API:
+
+    - [TiDB Cloud Starter](https://tidbcloud.com/?utm_source=github&utm_medium=referral&utm_campaign=pytidb_readme) on AWS: `Frankfurt (eu-central-1)` and `Singapore (ap-southeast-1)`
+
+## Cohere Embeddings
+
+Cohere offers multilingual embedding models for search, RAG, and classification. The latest `embed-v4.0` model supports text, images, and mixed content. You can use the Cohere Embeddings API with TiDB through the AI SDK or native SQL functions for automatic embedding generation.
+
+### Supported Models
+
+| Model Name                       | Dimensions | Max Input Tokens | Description |
+|----------------------------------|------------|------------------|-------------|
+| `cohere/embed-v4.0`             | 256, 512, 1024, 1536 (default) | 128k | Latest multimodal model supporting text, images, and mixed content (PDFs) |
+| `cohere/embed-english-v3.0`     | 1024       | 512              | High-performance English embedding model optimized for search and classification |
+| `cohere/embed-multilingual-v3.0`| 1024       | 512              | Multilingual model supporting 100+ languages |
+| `cohere/embed-english-light-v3.0` | 384     | 512              | Lightweight English model for faster processing with similar performance |
+| `cohere/embed-multilingual-light-v3.0` | 384 | 512          | Lightweight multilingual model for faster processing with similar performance |
+
+For a complete list of supported models and detailed specifications, see the [Cohere Embeddings Documentation](https://docs.cohere.com/docs/cohere-embed).
+
+## Usage example
+
+This example demonstrates creating a vector table, inserting documents, and performing similarity search using Cohere embedding models.
+
+### Step 1: Connect to the database
+
+=== "Python"
+
+    ```python
+    from pytidb import TiDBClient
+
+    tidb_client = TiDBClient.connect(
+        host="{gateway-region}.prod.aws.tidbcloud.com",
+        port=4000,
+        username="{prefix}.root",
+        password="{password}",
+        database="{database}",
+        ensure_db=True,
+    )
+    ```
+
+=== "SQL"
+
+    ```bash
+    mysql -h {gateway-region}.prod.aws.tidbcloud.com \
+        -P 4000 \
+        -u {prefix}.root \
+        -p{password} \
+        -D {database}
+    ```
+
+### Step 2: Configure the API key
+
+Create your API key from the [Cohere Dashboard](https://dashboard.cohere.com/api-keys) and bring your own key (BYOK) to use the embedding service.
+
+=== "Python"
+
+    Configure the API key for the Cohere embedding provider using the TiDB Client:
+
+    ```python
+    tidb_client.configure_embedding_provider(
+        provider="cohere",
+        api_key="{your-cohere-api-key}",
+    )
+    ```
+
+=== "SQL"
+
+    Set the API key for the Cohere embedding provider using SQL:
+
+    ```sql
+    SET @@GLOBAL.TIDB_EXP_EMBED_COHERE_API_KEY = "{your-cohere-api-key}";
+    ```
+
+### Step 3: Create a vector table
+
+Create a table with a vector field that uses the `cohere/embed-v4.0` model to generate 1536-dimensional vectors (default dimension):
+
+=== "Python"
+
+    ```python
+    from pytidb.schema import TableModel, Field
+    from pytidb.embeddings import EmbeddingFunction
+    from pytidb.datatype import TEXT
+
+    class Document(TableModel):
+        __tablename__ = "sample_documents"
+        id: int = Field(primary_key=True)
+        content: str = Field(sa_type=TEXT)
+        embedding: list[float] = EmbeddingFunction(
+            model_name="cohere/embed-v4.0"
+        ).VectorField(source_field="content")
+
+    table = tidb_client.create_table(schema=Document, if_exists="overwrite")
+    ```
+
+=== "SQL"
+
+    ```sql
+    CREATE TABLE sample_documents (
+        `id`        INT PRIMARY KEY,
+        `content`   TEXT,
+        `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(
+            "cohere/embed-v4.0",
+            `content`
+        )) STORED
+    );
+    ```
+
+### Step 4: Insert data into the table
+
+=== "Python"
+
+    Use the `table.insert()` or `table.bulk_insert()` API to add data:
+
+    ```python
+    documents = [
+        Document(id=1, content="Python: High-level programming language for data science and web development."),
+        Document(id=2, content="Python snake: Non-venomous constrictor found in tropical regions."),
+        Document(id=3, content="Python framework: Django and Flask are popular web frameworks."),
+        Document(id=4, content="Python libraries: NumPy and Pandas for data analysis."),
+        Document(id=5, content="Python ecosystem: Rich collection of packages and tools."),
+    ]
+    table.bulk_insert(documents)
+    ```
+
+=== "SQL"
+
+    Insert data using the `INSERT INTO` statement:
+
+    ```sql
+    INSERT INTO sample_documents (id, content)
+    VALUES
+        (1, "Python: High-level programming language for data science and web development."),
+        (2, "Python snake: Non-venomous constrictor found in tropical regions."),
+        (3, "Python framework: Django and Flask are popular web frameworks."),
+        (4, "Python libraries: NumPy and Pandas for data analysis."),
+        (5, "Python ecosystem: Rich collection of packages and tools.");
+    ```
+
+### Step 5: Search for similar documents
+
+=== "Python"
+
+    Use the `table.search()` API to perform vector search:
+
+    ```python
+    results = table.search("How to learn Python programming?") \
+        .limit(2) \
+        .to_list()
+    print(results)
+    ```
+
+=== "SQL"
+
+    Use the `VEC_EMBED_COSINE_DISTANCE` function to perform vector search based on cosine distance metric:
+
+    ```sql
+    SELECT
+        `id`,
+        `content`,
+        VEC_EMBED_COSINE_DISTANCE(embedding, "How to learn Python programming?") AS _distance
+    FROM sample_documents
+    ORDER BY _distance ASC
+    LIMIT 2;
+    ```