Merge branch 'wenxuan/auto-embed' of https://github.com/pingcap/docs into wenxuan/auto-embed

breezewish · breezewish · commit 7dafe5bf3f1e · 2025-08-21T20:01:43.000+08:00
Signed-off-by: Wish &lt;breezewish@outlook.com&gt;
diff --git a/tidb-cloud/vector-search-auto-embedding-amazon-titan.md b/tidb-cloud/vector-search-auto-embedding-amazon-titan.md
@@ -23,7 +23,7 @@ TiDB Cloud provides the following [Amazon Titan embedding model](https://docs.aw
 - Hosted by TiDB Cloud: ✅
 - Bring Your Own Key: ❌
 
-You may learn more from [its official documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html).
+For more details, see [its official documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html).
 
 ## Availability
 
@@ -78,7 +78,7 @@ Result:
 
 ## Options
 
-Additional options may be specified via the `additional_json_options` parameter of the `EMBED_TEXT()` function.
+You can specify additional options via the `additional_json_options` parameter of the `EMBED_TEXT()` function.
 
 - `normalize` – (optional) Flag indicating whether or not to normalize the output embedding. Defaults to true.
 - `dimensions` – (optional) The number of dimensions the output embedding should have. The following values are accepted: 1024 (default), 512, 256.
diff --git a/tidb-cloud/vector-search-auto-embedding-cohere.md b/tidb-cloud/vector-search-auto-embedding-cohere.md
@@ -74,7 +74,7 @@ CREATE TABLE sample (
 
 > **Note**:
 >
-> For Cohere model, you must specify `input_type` in the `EMBED_TEXT()`. `'{"input_type": "search_document", "input_type@search": "search_query"}'` means `input_type` is set to `search_document` when inserting data, and is set to `search_query` when performing vector search queries.
+> For the Cohere model, you must specify `input_type` in the `EMBED_TEXT()` function. For example, `'{"input_type": "search_document", "input_type@search": "search_query"}'` means that `input_type` is set to `search_document` for data insertion and `search_query` for vector searches.
 >
 > The `@search` suffix is used to mark that field to take effect only when it is used for vector search queries.
 
@@ -113,9 +113,9 @@ Result:
 
 ## Options (TiDB Cloud Hosted)
 
-Both Embed v3 and Multilingual Embed v3 models supports following options, which need to specified via the `additional_json_options` parameter of the `EMBED_TEXT()` function.
+Both the Embed v3 and Multilingual Embed v3 models support the following options, which you can specify via the `additional_json_options` parameter of the `EMBED_TEXT()` function.
 
-- `input_type` – **Required**. Prepends special tokens to differentiate each type from one another. You should not mix different types together, except when mixing types for for search and retrieval. In this case, embed your corpus with the `search_document` type and embedded queries with type `search_query` type.
+- `input_type` – **Required**. Prepends special tokens to differentiate each type from one another. You should not mix different types together, except when mixing types for search and retrieval. In this case, embed your corpus with the `search_document` type and embed queries with the `search_query` type.
 
   - `search_document` – In search use-cases, use `search_document` when you encode documents for embeddings that you store in a vector database.
   - `search_query` – Use `search_query` when querying your vector DB to find relevant documents.
diff --git a/tidb-cloud/vector-search-auto-embedding-gemini.md b/tidb-cloud/vector-search-auto-embedding-gemini.md
@@ -6,19 +6,19 @@ aliases: ["/tidb/stable/vector-search-auto-embedding-gemini"]
 
 # Gemini Embeddings
 
-All Gemini models are available for use under the `gemini/` prefix when you bring your own Gemini API key. To name a few:
+All Gemini models are available for use under the `gemini/` prefix when you bring your own Gemini API key.
 
 **gemini-embedding-001**
 
 - Name: `gemini/gemini-embedding-001`
-- Dimensions: 128 - 3072 (default: 3072)
+- Dimensions: 128–3072 (default: 3072)
 - Distance Metric: Cosine / L2
 - Max input text tokens: 2048
 - Price: Charged by Google
 - Hosted by TiDB Cloud: ❌
 - Bring Your Own Key: ✅
 
-For a full list of available models, please refer to [Gemini Documentation](https://ai.google.dev/gemini-api/docs/embeddings).
+For a full list of available models, please refer to [Gemini documentation](https://ai.google.dev/gemini-api/docs/embeddings).
 
 ## Availability
 
@@ -108,7 +108,7 @@ CREATE TABLE sample (
 );
 ```
 
-For all available options, please refer to [Gemini Documentation](https://ai.google.dev/gemini-api/docs/embeddings).
+For all available options, please refer to [Gemini documentation](https://ai.google.dev/gemini-api/docs/embeddings).
 
 ## Python Usage Example
 

Original file line number	Diff line number	Diff line change
`@@ -74,7 +74,7 @@ CREATE TABLE sample (`
`74`	`74`
`75`	`75`	`> Note:`
`76`	`76`	`>`
`77`		-> For Cohere model, you must specify `input_type` in the `EMBED_TEXT()`. `'{"input_type": "search_document", "input_type@search": "search_query"}'` means `input_type` is set to `search_document` when inserting data, and is set to `search_query` when performing vector search queries.
	`77`	+> For the Cohere model, you must specify `input_type` in the `EMBED_TEXT()` function. For example, `'{"input_type": "search_document", "input_type@search": "search_query"}'` means that `input_type` is set to `search_document` for data insertion and `search_query` for vector searches.
`78`	`78`	`>`
`79`	`79`	> The `@search` suffix is used to mark that field to take effect only when it is used for vector search queries.
`80`	`80`
`@@ -113,9 +113,9 @@ Result:`
`113`	`113`
`114`	`114`	`## Options (TiDB Cloud Hosted)`
`115`	`115`
`116`		-Both Embed v3 and Multilingual Embed v3 models supports following options, which need to specified via the `additional_json_options` parameter of the `EMBED_TEXT()` function.
	`116`	+Both the Embed v3 and Multilingual Embed v3 models support the following options, which you can specify via the `additional_json_options` parameter of the `EMBED_TEXT()` function.
`117`	`117`
`118`		-- `input_type` – Required. Prepends special tokens to differentiate each type from one another. You should not mix different types together, except when mixing types for for search and retrieval. In this case, embed your corpus with the `search_document` type and embedded queries with type `search_query` type.
	`118`	+- `input_type` – Required. Prepends special tokens to differentiate each type from one another. You should not mix different types together, except when mixing types for search and retrieval. In this case, embed your corpus with the `search_document` type and embed queries with the `search_query` type.
`119`	`119`
`120`	`120`	- `search_document` – In search use-cases, use `search_document` when you encode documents for embeddings that you store in a vector database.
`121`	`121`	- `search_query` – Use `search_query` when querying your vector DB to find relevant documents.