adjust docs

t83714 · t83714 · commit ea2df1934309 · 2024-07-25T10:51:53.000+10:00
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ Kubernetes: `>= 1.21.0`
 | Key | Type | Default | Description |
 |-----|------|---------|-------------|
 | affinity | object | `{}` |  |
-| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. - `config` (object): Optional; The configuration object that will be passed to the model.  - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.    Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. - `model_file_name` (string) Optional; - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation.   - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use.   - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension.   - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings.   - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model.  If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
+| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields:  <ul> <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li> <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li> <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li> <li> `config` (object): Optional; The configuration object that will be passed to the model.  </li> <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li> <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li> <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.    Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li> <li> `model_file_name` (string) Optional; </li> <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/>   <ul>   <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li>   <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li>   <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li>   <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li>   </ul>   </li> </ul> Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model.  If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
 | autoscaling.hpa.enabled | bool | `false` |  |
 | autoscaling.hpa.maxReplicas | int | `3` |  |
 | autoscaling.hpa.minReplicas | int | `1` |  |
diff --git a/deploy/magda-embedding-api/values.yaml b/deploy/magda-embedding-api/values.yaml
@@ -31,21 +31,26 @@ closeGraceDelay: 25000
 # Currently, the only supported config field is `modelList`.
 # Via the `modelList` field, you can specify a list of LLM models that the service supports.
 # Although you can specify multiple models, only one model will be used at this moment.
-# Each model item have the following fields:
-# - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required.
-# - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded.
-# - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded.
-# - `config` (object): Optional; The configuration object that will be passed to the model. 
-# - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used.
-# - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub.
-# - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. 
-#   Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests.
-# - `model_file_name` (string) Optional;
-# - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation.
-#   - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use.
-#   - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension.
-#   - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings.
-#   - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true.
+# Each model item have the following fields: 
+# <ul>
+# <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li>
+# <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li>
+# <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li>
+# <li> `config` (object): Optional; The configuration object that will be passed to the model.  </li>
+# <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li>
+# <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li>
+# <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. 
+#   Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li>
+# <li> `model_file_name` (string) Optional; </li>
+# <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/>
+#   <ul>
+#   <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li>
+#   <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li>
+#   <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li>
+#   <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li>
+#   </ul> 
+#  </li>
+# </ul>
 # Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. 
 # If you specify other models, the server will download the model from the huggingface model hub at the startup.
 # You might want to adjust the `startupProbe` settings to accommodate the model downloading time.