MindOCR inference supports Ascend310/Ascend310P devices, supports MindSpore Lite and ACL inference backend, integrates text detection, angle classification, and text recognition, implements end-to-end OCR inference process, and optimizes inference performance using pipeline parallelism.
The overall process of MindOCR Lite inference is as follows:
graph LR;
A[MindOCR models] -- export --> B[MindIR] -- converter_lite --> C[MindSpore Lite MindIR];
D[ThirdParty models] -- xx2onnx --> E[ONNX] -- converter_lite --> C;
C --input --> F[MindOCR Infer] -- outputs --> G[Evaluation];
H[images] --input --> F[MindOCR Infer];
Please refer to the environment installation to configure the inference runtime environment for MindOCR, and pay attention to selecting the ACL/Lite environment based on the model.
MindOCR inference not only supports exported models from trained ckpt file, but also supports the third-party models, as listed in the MindOCR Models Support List and Third-party Models Support List (PaddleOCR, MMOCR, etc.).
Please refer to the Conversion Tutorial, to convert it into a model format supported by MindOCR inference.
Enter the inference directory:cd deploy/py_infer
.
-
detection + classification + recognition
python infer.py \ --input_images_dir=/path/to/images \ --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \ --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \ --cls_model_path=/path/to/mindir/cls_mv3.mindir \ --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \ --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \ --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \ --res_save_dir=det_cls_rec \ --vis_pipeline_save_dir=det_cls_rec
The visualization images are stored in det_cls_rec, as shown in the picture.
Visualization of text detection and recognition result
The results are saved in det_cls_rec/pipeline_results.txt in the following format:
img_182.jpg [{"transcription": "cocoa", "points": [[14.0, 284.0], [222.0, 274.0], [225.0, 325.0], [17.0, 335.0]]}, {...}]
-
detection + recognition
If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.
python infer.py \ --input_images_dir=/path/to/images \ --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \ --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \ --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \ --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \ --res_save_dir=det_rec \ --vis_pipeline_save_dir=det_rec
The visualization images are stored in det_rec folder, as shown in the picture.
Visualization of text detection and recognition result
The recognition results are saved in det_rec/pipeline_results.txt in the following format:
img_498.jpg [{"transcription": "keep", "points": [[819.0, 71.0], [888.0, 67.0], [891.0, 104.0], [822.0, 108.0]]}, {...}]
-
detection
Run text detection alone.
python infer.py \ --input_images_dir=/path/to/images \ --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \ --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \ --res_save_dir=det \ --vis_det_save_dir=det
The visualization results are stored in the det folder, as shown in the picture.
Visualization of text detection result
The detection results are saved in the det/det_results.txt file in the following format:
img_108.jpg [[[226.0, 442.0], [402.0, 416.0], [404.0, 433.0], [228.0, 459.0]], [...]]
-
classification
Run text angle classification alone.
# cls_mv3.mindir is converted from ppocr python infer.py \ --input_images_dir=/path/to/images \ --cls_model_path=/path/to/mindir/cls_mv3.mindir \ --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \ --res_save_dir=cls
The results will be saved in cls/cls_results.txt, with the following format:
word_867.png ["180", 0.5176] word_1679.png ["180", 0.6226] word_1189.png ["0", 0.9360]
-
recognition
Run text recognition alone.
python infer.py \ --input_images_dir=/path/to/images \ --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \ --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \ --res_save_dir=rec
The results will be saved in rec/rec_results.txt, with the following format:
word_421.png "under" word_1657.png "candy" word_1814.png "cathay"
-
Basic settings
name type default description input_images_dir str None Image or folder path for inference device str Ascend Device type, support Ascend device_id int 0 Device id backend str lite Inference backend, support acl, lite parallel_num int 1 Number of parallel in each stage of pipeline parallelism precision_mode str None Precision mode, only supports setting by Model Conversion currently, and it takes no effect here -
Saving Result
name type default description res_save_dir str inference_results Saving dir for inference results vis_det_save_dir str None Saving dir for images of with detection boxes vis_pipeline_save_dir str None Saving dir for images of with detection boxes and text vis_font_path str None Font path for drawing text crop_save_dir str None Saving path for cropped images after detection show_log bool False Whether show log when inferring save_log_dir str None Log saving dir -
Text detection
name type default description det_model_path str None Model path for text detection det_model_name_or_config str None Model name or YAML config file path for text detection -
Text angle classification
name type default description cls_model_path str None Model path for text angle classification cls_model_name_or_config str None Model name or YAML config file path for text angle classification -
Text recognition
name type default description rec_model_path str None Model path for text recognition rec_model_name_or_config str None Model name or YAML config file path for text recognition character_dict_path str None Dict file for text recognition,default only supports numbers and lowercase
Notes:
*_model_name_or_config
can be the model name or YAML config file path, please refer to
MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).
Currently, only the Chinese DBNet, CRNN, and SVTR models in the PP-OCR series are supported.
Enter the inference directory:cd deploy/cpp_infer
,then execute the compilation script bash build.sh
. Once the build
process is complete, an executable file named 'infer' will be generated in the 'dist' directory located in the current
path.
-
detection + classification + recognition
./dist/infer \ --input_images_dir /path/to/images \ --backend lite \ --det_model_path /path/to/mindir/dbnet_resnet50.mindir \ --cls_model_path /path/to/mindir/crnn \ --rec_model_path /path/to/mindir/crnn_resnet34.mindir \ --character_dict_path /path/to/ppocr_keys_v1.txt \ --res_save_dir det_cls_rec
The results will be saved in det_cls_rec/pipeline_results.txt, with the following format:
img_478.jpg [{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
-
detection + recognition
If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.
./dist/infer \ --input_images_dir /path/to/images \ --backend lite \ --det_model_path /path/to/mindir/dbnet_resnet50.mindir \ --rec_model_path /path/to/mindir/crnn_resnet34.mindir \ --character_dict_path /path/to/ppocr_keys_v1.txt \ --res_save_dir det_rec
The results will be saved in det_rec/pipeline_results.txt, with the following format:
img_478.jpg [{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
-
detection
Run text detection alone.
./dist/infer \ --input_images_dir /path/to/images \ --backend lite \ --det_model_path /path/to/mindir/dbnet_resnet50.mindir \ --res_save_dir det
The results will be saved in det/det_results.txt, with the following format:
img_478.jpg [[[1114, 35], [1200, 0], [1234, 52], [1148, 97]], [...]]]
-
classification
Run text angle classification alone.
./dist/infer \ --input_images_dir /path/to/images \ --backend lite \ --cls_model_path /path/to/mindir/crnn \ --res_save_dir cls
The results will be saved in cls/cls_results.txt, with the following format:
word_867.png ["180", 0.5176] word_1679.png ["180", 0.6226] word_1189.png ["0", 0.9360]
-
Basic settings
name type default description input_images_dir str None Image or folder path for inference device str Ascend Device type, support Ascend device_id int 0 Device id backend str acl Inference backend, support acl, lite parallel_num int 1 Number of parallel in each stage of pipeline parallelism -
Saving Result
name type default description res_save_dir str inference_results Saving dir for inference results -
Text detection
name type default description det_model_path str None Model path for text detection -
Text angle classification
name type default description cls_model_path str None Model path for text angle classification -
Text recognition
name type default description rec_model_path str None Model path for text recognition rec_config_path str None Config file for text recognition character_dict_path str None Dict file for text recognition,default only supports numbers and lowercase