You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ Kubernetes: `>= 1.21.0`
46
46
| Key | Type | Default | Description |
47
47
|-----|------|---------|-------------|
48
48
| affinity | object |`{}`||
49
-
| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. - `config` (object): Optional; The configuration object that will be passed to the model. - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. - `model_file_name` (string) Optional; - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use. - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
49
+
| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: <ul> <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li> <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li> <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li> <li> `config` (object): Optional; The configuration object that will be passed to the model. </li> <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li> <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li> <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li> <li> `model_file_name` (string) Optional; </li> <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/> <ul> <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li> <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li> <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li> <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li> </ul> </li> </ul> Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
Copy file name to clipboardExpand all lines: deploy/magda-embedding-api/values.yaml
+20-15Lines changed: 20 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -31,21 +31,26 @@ closeGraceDelay: 25000
31
31
# Currently, the only supported config field is `modelList`.
32
32
# Via the `modelList` field, you can specify a list of LLM models that the service supports.
33
33
# Although you can specify multiple models, only one model will be used at this moment.
34
-
# Each model item have the following fields:
35
-
# - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required.
36
-
# - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded.
37
-
# - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded.
38
-
# - `config` (object): Optional; The configuration object that will be passed to the model.
39
-
# - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used.
40
-
# - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub.
41
-
# - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.
42
-
# Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests.
43
-
# - `model_file_name` (string) Optional;
44
-
# - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation.
45
-
# - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use.
46
-
# - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension.
47
-
# - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings.
48
-
# - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true.
34
+
# Each model item have the following fields:
35
+
# <ul>
36
+
# <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li>
37
+
# <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li>
38
+
# <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li>
39
+
# <li> `config` (object): Optional; The configuration object that will be passed to the model. </li>
40
+
# <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li>
41
+
# <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li>
42
+
# <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.
43
+
# Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li>
44
+
# <li> `model_file_name` (string) Optional; </li>
45
+
# <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/>
46
+
# <ul>
47
+
# <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li>
48
+
# <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li>
49
+
# <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li>
50
+
# <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li>
51
+
# </ul>
52
+
# </li>
53
+
# </ul>
49
54
# Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model.
50
55
# If you specify other models, the server will download the model from the huggingface model hub at the startup.
51
56
# You might want to adjust the `startupProbe` settings to accommodate the model downloading time.
0 commit comments