|
| 1 | +# Ernie-3.0 服务化部署示例 |
| 2 | + |
| 3 | +## 准备模型 |
| 4 | + |
| 5 | +下载ERNIE 3.0的新闻分类模型、序列标注模型(如果有已训练好的模型,跳过此步骤): |
| 6 | +```bash |
| 7 | +# 下载并解压新闻分类模型 |
| 8 | +wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/tnews_pruned_infer_model.zip |
| 9 | +unzip tnews_pruned_infer_model.zip |
| 10 | + |
| 11 | +# 将下载的模型移动到分类任务的模型仓库目录 |
| 12 | +mv tnews_pruned_infer_model/float32.pdmodel models/ernie_seqcls_model/1/model.pdmodel |
| 13 | +mv tnews_pruned_infer_model/float32.pdiparams models/ernie_seqcls_model/1/model.pdiparams |
| 14 | + |
| 15 | +# 下载并解压序列标注模型 |
| 16 | +wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/msra_ner_pruned_infer_model.zip |
| 17 | +unzip msra_ner_pruned_infer_model.zip |
| 18 | + |
| 19 | +# 将下载的模型移动到序列标注任务的模型仓库目录 |
| 20 | +mv msra_ner_pruned_infer_model/float32.pdmodel models/ernie_tokencls_model/1/model.pdmodel |
| 21 | +mv msra_ner_pruned_infer_model/float32.pdiparams models/ernie_tokencls_model/1/model.pdiparams |
| 22 | +``` |
| 23 | + |
| 24 | +模型下载移动好之后,分类任务的models目录结构如下: |
| 25 | +``` |
| 26 | +models |
| 27 | +├── ernie_seqcls # 分类任务的pipeline |
| 28 | +│ ├── 1 |
| 29 | +│ └── config.pbtxt # 通过这个文件组合前后处理和模型推理 |
| 30 | +├── ernie_seqcls_model # 分类任务的模型推理 |
| 31 | +│ ├── 1 |
| 32 | +│ │ └── model.onnx |
| 33 | +│ └── config.pbtxt |
| 34 | +├── ernie_seqcls_postprocess # 分类任务后处理 |
| 35 | +│ ├── 1 |
| 36 | +│ │ └── model.py |
| 37 | +│ └── config.pbtxt |
| 38 | +└── ernie_tokenizer # 预处理分词 |
| 39 | + ├── 1 |
| 40 | + │ └── model.py |
| 41 | + └── config.pbtxt |
| 42 | +``` |
| 43 | + |
| 44 | +## 拉取并运行镜像 |
| 45 | +```bash |
| 46 | +# CPU镜像, 仅支持Paddle/ONNX模型在CPU上进行服务化部署,支持的推理后端包括OpenVINO、Paddle Inference和ONNX Runtime |
| 47 | +docker pull paddlepaddle/fastdeploy:0.3.0-cpu-only-21.10 |
| 48 | + |
| 49 | +# GPU 镜像, 支持Paddle/ONNX模型在GPU/CPU上进行服务化部署,支持的推理后端包括OpenVINO、TensorRT、Paddle Inference和ONNX Runtime |
| 50 | +docker pull paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 |
| 51 | + |
| 52 | +# 运行 |
| 53 | +docker run -it --net=host --name fastdeploy_server --shm-size="1g" -v /path/serving/models:/models paddlepaddle/fastdeploy:0.3.0-cpu-only-21.10 bash |
| 54 | +``` |
| 55 | + |
| 56 | +## 部署模型 |
| 57 | +serving目录包含启动pipeline服务的配置和发送预测请求的代码,包括: |
| 58 | + |
| 59 | +``` |
| 60 | +models # 服务化启动需要的模型仓库,包含模型和服务配置文件 |
| 61 | +seq_cls_rpc_client.py # 新闻分类任务发送pipeline预测请求的脚本 |
| 62 | +token_cls_rpc_client.py # 序列标注任务发送pipeline预测请求的脚本 |
| 63 | +``` |
| 64 | + |
| 65 | +*注意*:启动服务时,Server的每个python后端进程默认申请`64M`内存,默认启动的docker无法启动多个python后端节点。有两个解决方案: |
| 66 | +- 1.启动容器时设置`shm-size`参数, 比如:`docker run -it --net=host --name fastdeploy_server --shm-size="1g" -v /path/serving/models:/models paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 bash` |
| 67 | +- 2.启动服务时设置python后端的`shm-default-byte-size`参数, 设置python后端的默认内存为10M: `tritonserver --model-repository=/models --backend-config=python,shm-default-byte-size=10485760` |
| 68 | + |
| 69 | +### 分类任务 |
| 70 | +在容器内执行下面命令启动服务: |
| 71 | +``` |
| 72 | +# 默认启动models下所有模型 |
| 73 | +fastdeployserver --model-repository=/models |
| 74 | +
|
| 75 | +# 可通过参数只启动分类任务 |
| 76 | +fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=ernie_seqcls |
| 77 | +``` |
| 78 | +输出打印如下: |
| 79 | +``` |
| 80 | +I1019 09:41:15.375496 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_tokenizer' version 1 |
| 81 | +I1019 09:41:15.375987 2823 model_repository_manager.cc:1022] loading: ernie_seqcls:1 |
| 82 | +I1019 09:41:15.477147 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_seqcls' version 1 |
| 83 | +I1019 09:41:15.477325 2823 server.cc:522] |
| 84 | +... |
| 85 | +I0613 08:59:20.577820 10021 server.cc:592] |
| 86 | ++----------------------------+---------+--------+ |
| 87 | +| Model | Version | Status | |
| 88 | ++----------------------------+---------+--------+ |
| 89 | +| ernie_seqcls | 1 | READY | |
| 90 | +| ernie_seqcls_model | 1 | READY | |
| 91 | +| ernie_seqcls_postprocess | 1 | READY | |
| 92 | +| ernie_tokenizer | 1 | READY | |
| 93 | ++----------------------------+---------+--------+ |
| 94 | +... |
| 95 | +I0601 07:15:15.923270 8059 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001 |
| 96 | +I0601 07:15:15.923604 8059 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000 |
| 97 | +I0601 07:15:15.964984 8059 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002 |
| 98 | +``` |
| 99 | + |
| 100 | +### 序列标注任务 |
| 101 | +在容器内执行下面命令启动序列标注服务: |
| 102 | +``` |
| 103 | +fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=ernie_tokencls --backend-config=python,shm-default-byte-size=10485760 |
| 104 | +``` |
| 105 | +输出打印如下: |
| 106 | +``` |
| 107 | +I1019 09:41:15.375496 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_tokenizer' version 1 |
| 108 | +I1019 09:41:15.375987 2823 model_repository_manager.cc:1022] loading: ernie_seqcls:1 |
| 109 | +I1019 09:41:15.477147 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_seqcls' version 1 |
| 110 | +I1019 09:41:15.477325 2823 server.cc:522] |
| 111 | +... |
| 112 | +I0613 08:59:20.577820 10021 server.cc:592] |
| 113 | ++----------------------------+---------+--------+ |
| 114 | +| Model | Version | Status | |
| 115 | ++----------------------------+---------+--------+ |
| 116 | +| ernie_tokencls | 1 | READY | |
| 117 | +| ernie_tokencls_model | 1 | READY | |
| 118 | +| ernie_tokencls_postprocess | 1 | READY | |
| 119 | +| ernie_tokenizer | 1 | READY | |
| 120 | ++----------------------------+---------+--------+ |
| 121 | +... |
| 122 | +I0601 07:15:15.923270 8059 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001 |
| 123 | +I0601 07:15:15.923604 8059 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000 |
| 124 | +I0601 07:15:15.964984 8059 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002 |
| 125 | +``` |
| 126 | + |
| 127 | +## 客户端请求 |
| 128 | +客户端请求可以在本地执行脚本请求;也可以在容器中执行。 |
| 129 | + |
| 130 | +本地执行脚本需要先安装依赖: |
| 131 | +``` |
| 132 | +pip install grpcio |
| 133 | +pip install tritonclient[all] |
| 134 | +
|
| 135 | +# 如果bash无法识别括号,可以使用如下指令安装: |
| 136 | +pip install tritonclient\[all\] |
| 137 | +``` |
| 138 | + |
| 139 | +### 分类任务 |
| 140 | +注意执行客户端请求时关闭代理,并根据实际情况修改main函数中的ip地址(启动服务所在的机器) |
| 141 | +``` |
| 142 | +python seq_cls_grpc_client.py |
| 143 | +``` |
| 144 | +输出打印如下: |
| 145 | +``` |
| 146 | +{'label': array([5, 9]), 'confidence': array([0.6425664 , 0.66534853], dtype=float32)} |
| 147 | +{'label': array([4]), 'confidence': array([0.53198355], dtype=float32)} |
| 148 | +acc: 0.5731 |
| 149 | +``` |
| 150 | + |
| 151 | +### 序列标注任务 |
| 152 | +注意执行客户端请求时关闭代理,并根据实际情况修改main函数中的ip地址(启动服务所在的机器) |
| 153 | +``` |
| 154 | +python token_cls_grpc_client.py |
| 155 | +``` |
| 156 | +输出打印如下: |
| 157 | +``` |
| 158 | +input data: 北京的涮肉,重庆的火锅,成都的小吃都是极具特色的美食。 |
| 159 | +The model detects all entities: |
| 160 | +entity: 北京 label: LOC pos: [0, 1] |
| 161 | +entity: 重庆 label: LOC pos: [6, 7] |
| 162 | +entity: 成都 label: LOC pos: [12, 13] |
| 163 | +input data: 原产玛雅故国的玉米,早已成为华夏大地主要粮食作物之一。 |
| 164 | +The model detects all entities: |
| 165 | +entity: 玛雅 label: LOC pos: [2, 3] |
| 166 | +entity: 华夏 label: LOC pos: [14, 15] |
| 167 | +``` |
| 168 | + |
| 169 | +## 配置修改 |
| 170 | + |
| 171 | +当前分类任务(ernie_seqcls_model/config.pbtxt)默认配置在CPU上运行OpenVINO引擎; 序列标注任务默认配置在GPU上运行Paddle引擎。如果要在CPU/GPU或其他推理引擎上运行, 需要修改配置,详情请参考[配置文档](../../../../../serving/docs/zh_CN/model_configuration.md) |
0 commit comments