Skip to content

Commit 587ffd4

Browse files
authored
[Serving]add ernie-3.0 demo (#399)
serving add ernie-3.0 demo
1 parent f450917 commit 587ffd4

File tree

18 files changed

+1116
-0
lines changed

18 files changed

+1116
-0
lines changed

Diff for: examples/text/ernie-3.0/serving/README.md

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Ernie-3.0 服务化部署示例
2+
3+
## 准备模型
4+
5+
下载ERNIE 3.0的新闻分类模型、序列标注模型(如果有已训练好的模型,跳过此步骤):
6+
```bash
7+
# 下载并解压新闻分类模型
8+
wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/tnews_pruned_infer_model.zip
9+
unzip tnews_pruned_infer_model.zip
10+
11+
# 将下载的模型移动到分类任务的模型仓库目录
12+
mv tnews_pruned_infer_model/float32.pdmodel models/ernie_seqcls_model/1/model.pdmodel
13+
mv tnews_pruned_infer_model/float32.pdiparams models/ernie_seqcls_model/1/model.pdiparams
14+
15+
# 下载并解压序列标注模型
16+
wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/msra_ner_pruned_infer_model.zip
17+
unzip msra_ner_pruned_infer_model.zip
18+
19+
# 将下载的模型移动到序列标注任务的模型仓库目录
20+
mv msra_ner_pruned_infer_model/float32.pdmodel models/ernie_tokencls_model/1/model.pdmodel
21+
mv msra_ner_pruned_infer_model/float32.pdiparams models/ernie_tokencls_model/1/model.pdiparams
22+
```
23+
24+
模型下载移动好之后,分类任务的models目录结构如下:
25+
```
26+
models
27+
├── ernie_seqcls # 分类任务的pipeline
28+
│   ├── 1
29+
│   └── config.pbtxt # 通过这个文件组合前后处理和模型推理
30+
├── ernie_seqcls_model # 分类任务的模型推理
31+
│   ├── 1
32+
│   │   └── model.onnx
33+
│   └── config.pbtxt
34+
├── ernie_seqcls_postprocess # 分类任务后处理
35+
│   ├── 1
36+
│   │   └── model.py
37+
│   └── config.pbtxt
38+
└── ernie_tokenizer # 预处理分词
39+
├── 1
40+
│   └── model.py
41+
└── config.pbtxt
42+
```
43+
44+
## 拉取并运行镜像
45+
```bash
46+
# CPU镜像, 仅支持Paddle/ONNX模型在CPU上进行服务化部署,支持的推理后端包括OpenVINO、Paddle Inference和ONNX Runtime
47+
docker pull paddlepaddle/fastdeploy:0.3.0-cpu-only-21.10
48+
49+
# GPU 镜像, 支持Paddle/ONNX模型在GPU/CPU上进行服务化部署,支持的推理后端包括OpenVINO、TensorRT、Paddle Inference和ONNX Runtime
50+
docker pull paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10
51+
52+
# 运行
53+
docker run -it --net=host --name fastdeploy_server --shm-size="1g" -v /path/serving/models:/models paddlepaddle/fastdeploy:0.3.0-cpu-only-21.10 bash
54+
```
55+
56+
## 部署模型
57+
serving目录包含启动pipeline服务的配置和发送预测请求的代码,包括:
58+
59+
```
60+
models # 服务化启动需要的模型仓库,包含模型和服务配置文件
61+
seq_cls_rpc_client.py # 新闻分类任务发送pipeline预测请求的脚本
62+
token_cls_rpc_client.py # 序列标注任务发送pipeline预测请求的脚本
63+
```
64+
65+
*注意*:启动服务时,Server的每个python后端进程默认申请`64M`内存,默认启动的docker无法启动多个python后端节点。有两个解决方案:
66+
- 1.启动容器时设置`shm-size`参数, 比如:`docker run -it --net=host --name fastdeploy_server --shm-size="1g" -v /path/serving/models:/models paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 bash`
67+
- 2.启动服务时设置python后端的`shm-default-byte-size`参数, 设置python后端的默认内存为10M: `tritonserver --model-repository=/models --backend-config=python,shm-default-byte-size=10485760`
68+
69+
### 分类任务
70+
在容器内执行下面命令启动服务:
71+
```
72+
# 默认启动models下所有模型
73+
fastdeployserver --model-repository=/models
74+
75+
# 可通过参数只启动分类任务
76+
fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=ernie_seqcls
77+
```
78+
输出打印如下:
79+
```
80+
I1019 09:41:15.375496 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_tokenizer' version 1
81+
I1019 09:41:15.375987 2823 model_repository_manager.cc:1022] loading: ernie_seqcls:1
82+
I1019 09:41:15.477147 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_seqcls' version 1
83+
I1019 09:41:15.477325 2823 server.cc:522]
84+
...
85+
I0613 08:59:20.577820 10021 server.cc:592]
86+
+----------------------------+---------+--------+
87+
| Model | Version | Status |
88+
+----------------------------+---------+--------+
89+
| ernie_seqcls | 1 | READY |
90+
| ernie_seqcls_model | 1 | READY |
91+
| ernie_seqcls_postprocess | 1 | READY |
92+
| ernie_tokenizer | 1 | READY |
93+
+----------------------------+---------+--------+
94+
...
95+
I0601 07:15:15.923270 8059 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
96+
I0601 07:15:15.923604 8059 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
97+
I0601 07:15:15.964984 8059 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
98+
```
99+
100+
### 序列标注任务
101+
在容器内执行下面命令启动序列标注服务:
102+
```
103+
fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=ernie_tokencls --backend-config=python,shm-default-byte-size=10485760
104+
```
105+
输出打印如下:
106+
```
107+
I1019 09:41:15.375496 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_tokenizer' version 1
108+
I1019 09:41:15.375987 2823 model_repository_manager.cc:1022] loading: ernie_seqcls:1
109+
I1019 09:41:15.477147 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_seqcls' version 1
110+
I1019 09:41:15.477325 2823 server.cc:522]
111+
...
112+
I0613 08:59:20.577820 10021 server.cc:592]
113+
+----------------------------+---------+--------+
114+
| Model | Version | Status |
115+
+----------------------------+---------+--------+
116+
| ernie_tokencls | 1 | READY |
117+
| ernie_tokencls_model | 1 | READY |
118+
| ernie_tokencls_postprocess | 1 | READY |
119+
| ernie_tokenizer | 1 | READY |
120+
+----------------------------+---------+--------+
121+
...
122+
I0601 07:15:15.923270 8059 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
123+
I0601 07:15:15.923604 8059 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
124+
I0601 07:15:15.964984 8059 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
125+
```
126+
127+
## 客户端请求
128+
客户端请求可以在本地执行脚本请求;也可以在容器中执行。
129+
130+
本地执行脚本需要先安装依赖:
131+
```
132+
pip install grpcio
133+
pip install tritonclient[all]
134+
135+
# 如果bash无法识别括号,可以使用如下指令安装:
136+
pip install tritonclient\[all\]
137+
```
138+
139+
### 分类任务
140+
注意执行客户端请求时关闭代理,并根据实际情况修改main函数中的ip地址(启动服务所在的机器)
141+
```
142+
python seq_cls_grpc_client.py
143+
```
144+
输出打印如下:
145+
```
146+
{'label': array([5, 9]), 'confidence': array([0.6425664 , 0.66534853], dtype=float32)}
147+
{'label': array([4]), 'confidence': array([0.53198355], dtype=float32)}
148+
acc: 0.5731
149+
```
150+
151+
### 序列标注任务
152+
注意执行客户端请求时关闭代理,并根据实际情况修改main函数中的ip地址(启动服务所在的机器)
153+
```
154+
python token_cls_grpc_client.py
155+
```
156+
输出打印如下:
157+
```
158+
input data: 北京的涮肉,重庆的火锅,成都的小吃都是极具特色的美食。
159+
The model detects all entities:
160+
entity: 北京 label: LOC pos: [0, 1]
161+
entity: 重庆 label: LOC pos: [6, 7]
162+
entity: 成都 label: LOC pos: [12, 13]
163+
input data: 原产玛雅故国的玉米,早已成为华夏大地主要粮食作物之一。
164+
The model detects all entities:
165+
entity: 玛雅 label: LOC pos: [2, 3]
166+
entity: 华夏 label: LOC pos: [14, 15]
167+
```
168+
169+
## 配置修改
170+
171+
当前分类任务(ernie_seqcls_model/config.pbtxt)默认配置在CPU上运行OpenVINO引擎; 序列标注任务默认配置在GPU上运行Paddle引擎。如果要在CPU/GPU或其他推理引擎上运行, 需要修改配置,详情请参考[配置文档](../../../../../serving/docs/zh_CN/model_configuration.md)

Diff for: examples/text/ernie-3.0/serving/models/ernie_seqcls/1/README.md

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
name: "ernie_seqcls"
2+
platform: "ensemble"
3+
max_batch_size: 64
4+
input [
5+
{
6+
name: "INPUT"
7+
data_type: TYPE_STRING
8+
dims: [ 1 ]
9+
}
10+
]
11+
output [
12+
{
13+
name: "label"
14+
data_type: TYPE_INT64
15+
dims: [ 1 ]
16+
},
17+
{
18+
name: "confidence"
19+
data_type: TYPE_FP32
20+
dims: [ 1 ]
21+
}
22+
]
23+
ensemble_scheduling {
24+
step [
25+
{
26+
model_name: "ernie_tokenizer"
27+
model_version: 1
28+
input_map {
29+
key: "INPUT_0"
30+
value: "INPUT"
31+
}
32+
output_map {
33+
key: "OUTPUT_0"
34+
value: "tokenizer_input_ids"
35+
}
36+
output_map {
37+
key: "OUTPUT_1"
38+
value: "tokenizer_token_type_ids"
39+
}
40+
},
41+
{
42+
model_name: "ernie_seqcls_model"
43+
model_version: 1
44+
input_map {
45+
key: "input_ids"
46+
value: "tokenizer_input_ids"
47+
}
48+
input_map {
49+
key: "token_type_ids"
50+
value: "tokenizer_token_type_ids"
51+
}
52+
output_map {
53+
key: "linear_113.tmp_1"
54+
value: "OUTPUT_2"
55+
}
56+
},
57+
{
58+
model_name: "ernie_seqcls_postprocess"
59+
model_version: 1
60+
input_map {
61+
key: "POST_INPUT"
62+
value: "OUTPUT_2"
63+
}
64+
output_map {
65+
key: "POST_label"
66+
value: "label"
67+
}
68+
output_map {
69+
key: "POST_confidence"
70+
value: "confidence"
71+
}
72+
}
73+
]
74+
}
75+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
本目录存放Ernie-3.0模型
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
backend: "fastdeploy"
2+
max_batch_size: 64
3+
input [
4+
{
5+
name: "input_ids"
6+
data_type: TYPE_INT64
7+
dims: [ -1 ]
8+
},
9+
{
10+
name: "token_type_ids"
11+
data_type: TYPE_INT64
12+
dims: [ -1 ]
13+
}
14+
]
15+
output [
16+
{
17+
name: "linear_113.tmp_1"
18+
data_type: TYPE_FP32
19+
dims: [ 15 ]
20+
}
21+
]
22+
23+
instance_group [
24+
{
25+
# 创建1个实例
26+
count: 1
27+
# 使用CPU推理(KIND_CPU、KIND_GPU)
28+
kind: KIND_CPU
29+
}
30+
]
31+
32+
optimization {
33+
execution_accelerators {
34+
cpu_execution_accelerator : [
35+
{
36+
# use openvino backend
37+
name: "openvino"
38+
parameters { key: "cpu_threads" value: "5" }
39+
}
40+
]
41+
}
42+
}

0 commit comments

Comments
 (0)