Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,16 @@ venv/

# Development
models
weights
weights_tf
models_tf
models_sd
weights
weights_tf
weights_sd
.ipynb_checkpoints
deb
*plan
*onnx
*pt
*pth
inference_notebook
bash_scripts
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ venv/
# Development
weights
weights_tf
weights_sd
.ipynb_checkpoints
*.plan
*.onnx
Expand Down
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ I have also prepared some notes here in README, you can explore them too.
1. Make sure to use the same input and output names while creating the Onnx model and during client inference.
1. Take care of the dtypes you are using to compile to Onnx and the ones specified in the `config.pbtxt`. For instance, in the case of transformers tokenizer, it returns dtype int64 and if you use int32 (preferred) in `config.pbtxt`, it will fail.

### Torchscript Backend

1. You can configure the following [parameters](https://github.com/triton-inference-server/pytorch_backend#parameters) when using torchscript platform.

### TensorRT Backend

#### Installation
Expand All @@ -91,6 +95,24 @@ Personal recommendation is to run this within a docker container.
1. TensorRT is not supported for each operation and can cause issues. In that case, try upgrading its version but keep in mind the CUDA version and trition of your system. If possible update the CUDA version.
1. FP16 version takes time to compile so take a break.

## Stable Diffusion

1. Create the image

```bash
docker build -t triton_cc_pt:0.0.1 -f dockers/Dockerfile.cpu.pt .
```

1. To start a container

```bash
bash bash_scripts/triton_server_sd.sh
```

1. While compiling Unet with ONNX, it will create multiple files because the model size >2GB.

1. If loading all of these models for a pipeline doesn't work and doesn't show any significant info in the logs, try loading them individually with `--log-verbose=1`.

## Features

### Dynamic Batching
Expand Down
1 change: 1 addition & 0 deletions bash_scripts/triton_server_sd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docker run --shm-size=16g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 --rm -it -v ${PWD}/models_sd/:/project/models_sd/ -v ${PWD}/weights_sd/:/project/weights_sd/ triton_cc_sd:0.0.1 tritonserver --model-repository models_sd/torchscript --log-verbose=2 --model-control-mode=poll
12 changes: 12 additions & 0 deletions dockers/Dockerfile.cpu.sd
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM nvcr.io/nvidia/tritonserver:22.08-py3

ARG PROJECT_PATH=/project

WORKDIR ${PROJECT_PATH}
SHELL ["/bin/bash", "-c"]

COPY requirements ${PROJECT_PATH}/requirements/

RUN pip install --upgrade pip && \
pip install torch==1.13.1 --extra-index-url https://download.pytorch.org/whl/cpu && \
pip install -r ${PROJECT_PATH}/requirements/sd.txt
Loading