Skip to content

Commit 6f461d0

Browse files
committed
increase the default memory limit to 2G
1 parent bdc759f commit 6f461d0

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ Kubernetes: `>= 1.21.0`
100100
| readinessProbe.successThreshold | int | `1` | |
101101
| readinessProbe.timeoutSeconds | int | `5` | |
102102
| replicas | int | `2` | |
103-
| resources.limits.memory | string | `"1100M"` | the memory limit of the container Due to [this issue of ONNX runtime](https://github.com/microsoft/onnxruntime/issues/15080), the peak memory usage of the service is much higher than the model file size. When change the default model, be sure to test the peak memory usage of the service before setting the memory limit. |
103+
| resources.limits.memory | string | `"2000M"` | the memory limit of the container Due to [this issue of ONNX runtime](https://github.com/microsoft/onnxruntime/issues/15080), the peak memory usage of the service is much higher than the model file size. When change the default model, be sure to test the peak memory usage of the service before setting the memory limit. When test your model memory requirement, please note that the memory usage of the model often goes much higher with long context length. E.g. the default model supports up to 8192 tokens, but when the content go beyond 512 tokens, the memory usage will be much higher (requires around 2G). |
104104
| resources.requests.cpu | string | `"100m"` | |
105105
| resources.requests.memory | string | `"850M"` | the memory request of the container Once the model is loaded, the memory usage of the service for serving request would be much lower. Set to 850M for default model. |
106106
| service.annotations | object | `{}` | |

deploy/magda-embedding-api/values.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,4 +209,6 @@ resources:
209209
# -- (string) the memory limit of the container
210210
# Due to [this issue of ONNX runtime](https://github.com/microsoft/onnxruntime/issues/15080), the peak memory usage of the service is much higher than the model file size.
211211
# When change the default model, be sure to test the peak memory usage of the service before setting the memory limit.
212-
memory: "1100M"
212+
# When test your model memory requirement, please note that the memory usage of the model often goes much higher with long context length.
213+
# E.g. the default model supports up to 8192 tokens, but when the content go beyond 512 tokens, the memory usage will be much higher (requires around 2G).
214+
memory: "2000M"

0 commit comments

Comments
 (0)