rohitgr7
diff --git a/‎.dockerignore‎
Lines changed: 4 additions & 2 deletions b/‎.dockerignore‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 9 additions & 0 deletions b/‎README.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎bash_scripts/triton_server_sd.sh‎
Lines changed: 1 addition & 0 deletions b/‎bash_scripts/triton_server_sd.sh‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎models/tensorrt_fp16/config.pbtxt‎ renamed to ‎bash_scripts/triton_server_tensorflow.sh‎ b/‎models/tensorrt_fp16/config.pbtxt‎ renamed to ‎bash_scripts/triton_server_tensorflow.sh‎
diff --git a/‎dockers/Dockerfile.cpu.sd‎
Lines changed: 12 additions & 0 deletions b/‎dockers/Dockerfile.cpu.sd‎
Lines changed: 12 additions & 0 deletions
@@ -11,14 +11,16 @@ venv/
 
 # Development
 models
-weights
-weights_tf
 models_tf
 models_sd
+weights
+weights_tf
+weights_sd
 .ipynb_checkpoints
 deb
 *plan
 *onnx
 *pt
 *pth
 inference_notebook
+bash_scripts
@@ -13,6 +13,7 @@ venv/
 # Development
 weights
 weights_tf
+weights_sd
 .ipynb_checkpoints
 *.plan
 *.onnx
 
@@ -72,6 +72,10 @@ I have also prepared some notes here in README, you can explore them too.
 1. Make sure to use the same input and output names while creating the Onnx model and during client inference.
 1. Take care of the dtypes you are using to compile to Onnx and the ones specified in the `config.pbtxt`. For instance, in the case of transformers tokenizer, it returns dtype int64 and if you use int32 (preferred) in `config.pbtxt`, it will fail.
 
+### Torchscript Backend
+
+1. You can configure the following [parameters](https://github.com/triton-inference-server/pytorch_backend#parameters) when using torchscript platform.
+
 ### TensorRT Backend
 
 #### Installation
@@ -91,6 +95,11 @@ Personal recommendation is to run this within a docker container.
 1. TensorRT is not supported for each operation and can cause issues. In that case, try upgrading its version but keep in mind the CUDA version and trition of your system. If possible update the CUDA version.
 1. FP16 version takes time to compile so take a break.
 
+## Stable Diffusion
+
+1. While compiling Unet with ONNX, it will create multiple files because the model size >2GB.
+1. If loading all of these models for a pipeline doesn't work and doesn't show any significant info in the logs, try loading them individually with `--log-verbose=1`.
+
 ## Features
 
 ### Dynamic Batching
 
@@ -0,0 +1 @@
+docker run --shm-size=16g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 --rm -it -v ${PWD}/models_sd/:/project/models_sd/ -v ${PWD}/weights_sd/:/project/weights_sd/ triton_cc_sd:0.0.1 tritonserver --model-repository models_sd/torchscript --log-verbose=2 --model-control-mode=poll
@@ -0,0 +1,12 @@
+FROM nvcr.io/nvidia/tritonserver:22.08-py3
+
+ARG PROJECT_PATH=/project
+
+WORKDIR ${PROJECT_PATH}
+SHELL ["/bin/bash", "-c"]
+
+COPY requirements ${PROJECT_PATH}/requirements/
+
+RUN pip install --upgrade pip && \
+    pip install torch==1.13.1 --extra-index-url https://download.pytorch.org/whl/cpu && \
+    pip install -r ${PROJECT_PATH}/requirements/sd.txt
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+docker run --shm-size=16g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 --rm -it -v ${PWD}/models_sd/:/project/models_sd/ -v ${PWD}/weights_sd/:/project/weights_sd/ triton_cc_sd:0.0.1 tritonserver --model-repository models_sd/torchscript --log-verbose=2 --model-control-mode=poll`