Skip to content

Commit ea2df19

Browse files
committed
adjust docs
1 parent 4e813ca commit ea2df19

File tree

2 files changed

+21
-16
lines changed

2 files changed

+21
-16
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Kubernetes: `>= 1.21.0`
4646
| Key | Type | Default | Description |
4747
|-----|------|---------|-------------|
4848
| affinity | object | `{}` | |
49-
| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. - `config` (object): Optional; The configuration object that will be passed to the model. - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. - `model_file_name` (string) Optional; - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use. - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
49+
| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: <ul> <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li> <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li> <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li> <li> `config` (object): Optional; The configuration object that will be passed to the model. </li> <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li> <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li> <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li> <li> `model_file_name` (string) Optional; </li> <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/> <ul> <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li> <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li> <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li> <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li> </ul> </li> </ul> Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
5050
| autoscaling.hpa.enabled | bool | `false` | |
5151
| autoscaling.hpa.maxReplicas | int | `3` | |
5252
| autoscaling.hpa.minReplicas | int | `1` | |

deploy/magda-embedding-api/values.yaml

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,21 +31,26 @@ closeGraceDelay: 25000
3131
# Currently, the only supported config field is `modelList`.
3232
# Via the `modelList` field, you can specify a list of LLM models that the service supports.
3333
# Although you can specify multiple models, only one model will be used at this moment.
34-
# Each model item have the following fields:
35-
# - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required.
36-
# - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded.
37-
# - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded.
38-
# - `config` (object): Optional; The configuration object that will be passed to the model.
39-
# - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used.
40-
# - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub.
41-
# - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.
42-
# Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests.
43-
# - `model_file_name` (string) Optional;
44-
# - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation.
45-
# - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use.
46-
# - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension.
47-
# - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings.
48-
# - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true.
34+
# Each model item have the following fields:
35+
# <ul>
36+
# <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li>
37+
# <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li>
38+
# <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li>
39+
# <li> `config` (object): Optional; The configuration object that will be passed to the model. </li>
40+
# <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li>
41+
# <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li>
42+
# <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.
43+
# Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li>
44+
# <li> `model_file_name` (string) Optional; </li>
45+
# <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/>
46+
# <ul>
47+
# <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li>
48+
# <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li>
49+
# <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li>
50+
# <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li>
51+
# </ul>
52+
# </li>
53+
# </ul>
4954
# Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model.
5055
# If you specify other models, the server will download the model from the huggingface model hub at the startup.
5156
# You might want to adjust the `startupProbe` settings to accommodate the model downloading time.

0 commit comments

Comments
 (0)