Skip to content

Latest commit

 

History

History
346 lines (256 loc) · 13.8 KB

inference_tutorial.md

File metadata and controls

346 lines (256 loc) · 13.8 KB

Python/C++ Inference on Ascend 310

1. Introduction

MindOCR inference supports Ascend310/Ascend310P devices, supports MindSpore Lite and ACL inference backend, integrates text detection, angle classification, and text recognition, implements end-to-end OCR inference process, and optimizes inference performance using pipeline parallelism.

The overall process of MindOCR Lite inference is as follows:

graph LR;
    A[MindOCR models] -- export --> B[MindIR] -- converter_lite --> C[MindSpore Lite MindIR];
    D[ThirdParty models] -- xx2onnx --> E[ONNX] -- converter_lite --> C;
    C --input --> F[MindOCR Infer] -- outputs --> G[Evaluation];
    H[images] --input --> F[MindOCR Infer];
Loading

2. Environment

Please refer to the environment installation to configure the inference runtime environment for MindOCR, and pay attention to selecting the ACL/Lite environment based on the model.

3. Model conversion

MindOCR inference not only supports exported models from trained ckpt file, but also supports the third-party models, as listed in the MindOCR Models Support List and Third-party Models Support List (PaddleOCR, MMOCR, etc.).

Please refer to the Conversion Tutorial, to convert it into a model format supported by MindOCR inference.

4. Inference (Python)

Enter the inference directory:cd deploy/py_infer.

4.1 Command example

  • detection + classification + recognition

    python infer.py \
        --input_images_dir=/path/to/images \
        --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
        --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
        --cls_model_path=/path/to/mindir/cls_mv3.mindir \
        --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
        --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
        --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
        --res_save_dir=det_cls_rec \
        --vis_pipeline_save_dir=det_cls_rec

    The visualization images are stored in det_cls_rec, as shown in the picture.

    Visualization of text detection and recognition result

    The results are saved in det_cls_rec/pipeline_results.txt in the following format:

    img_182.jpg	[{"transcription": "cocoa", "points": [[14.0, 284.0], [222.0, 274.0], [225.0, 325.0], [17.0, 335.0]]}, {...}]
    
  • detection + recognition

    If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.

    python infer.py \
        --input_images_dir=/path/to/images \
        --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
        --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
        --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
        --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
        --res_save_dir=det_rec \
        --vis_pipeline_save_dir=det_rec

    The visualization images are stored in det_rec folder, as shown in the picture.

    Visualization of text detection and recognition result

    The recognition results are saved in det_rec/pipeline_results.txt in the following format:

    img_498.jpg	[{"transcription": "keep", "points": [[819.0, 71.0], [888.0, 67.0], [891.0, 104.0], [822.0, 108.0]]}, {...}]
    
  • detection

    Run text detection alone.

    python infer.py \
        --input_images_dir=/path/to/images \
        --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
        --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
        --res_save_dir=det \
        --vis_det_save_dir=det

    The visualization results are stored in the det folder, as shown in the picture.

    Visualization of text detection result

    The detection results are saved in the det/det_results.txt file in the following format:

    img_108.jpg	[[[226.0, 442.0], [402.0, 416.0], [404.0, 433.0], [228.0, 459.0]], [...]]
    
  • classification

    Run text angle classification alone.

    # cls_mv3.mindir is converted from ppocr
    python infer.py \
        --input_images_dir=/path/to/images \
        --cls_model_path=/path/to/mindir/cls_mv3.mindir \
        --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
        --res_save_dir=cls

    The results will be saved in cls/cls_results.txt, with the following format:

    word_867.png   ["180", 0.5176]
    word_1679.png  ["180", 0.6226]
    word_1189.png  ["0", 0.9360]
    
  • recognition

    Run text recognition alone.

    python infer.py \
        --input_images_dir=/path/to/images \
        --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
        --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
        --res_save_dir=rec

    The results will be saved in rec/rec_results.txt, with the following format:

    word_421.png   "under"
    word_1657.png  "candy"
    word_1814.png  "cathay"
    

4.2 Detail of inference parameter

  • Basic settings

    name type default description
    input_images_dir str None Image or folder path for inference
    device str Ascend Device type, support Ascend
    device_id int 0 Device id
    backend str lite Inference backend, support acl, lite
    parallel_num int 1 Number of parallel in each stage of pipeline parallelism
    precision_mode str None Precision mode, only supports setting by Model Conversion currently, and it takes no effect here
  • Saving Result

    name type default description
    res_save_dir str inference_results Saving dir for inference results
    vis_det_save_dir str None Saving dir for images of with detection boxes
    vis_pipeline_save_dir str None Saving dir for images of with detection boxes and text
    vis_font_path str None Font path for drawing text
    crop_save_dir str None Saving path for cropped images after detection
    show_log bool False Whether show log when inferring
    save_log_dir str None Log saving dir
  • Text detection

    name type default description
    det_model_path str None Model path for text detection
    det_model_name_or_config str None Model name or YAML config file path for text detection
  • Text angle classification

    name type default description
    cls_model_path str None Model path for text angle classification
    cls_model_name_or_config str None Model name or YAML config file path for text angle classification
  • Text recognition

    name type default description
    rec_model_path str None Model path for text recognition
    rec_model_name_or_config str None Model name or YAML config file path for text recognition
    character_dict_path str None Dict file for text recognition,default only supports numbers and lowercase

Notes:

*_model_name_or_config can be the model name or YAML config file path, please refer to MindOCR Models Support List and Third-party Models Support List (PaddleOCR, MMOCR, etc.).

5. Inference (C++)

Currently, only the Chinese DBNet, CRNN, and SVTR models in the PP-OCR series are supported.

Enter the inference directory:cd deploy/cpp_infer,then execute the compilation script bash build.sh. Once the build process is complete, an executable file named 'infer' will be generated in the 'dist' directory located in the current path.

5.1 Command example

  • detection + classification + recognition

    ./dist/infer \
        --input_images_dir /path/to/images \
        --backend lite \
        --det_model_path /path/to/mindir/dbnet_resnet50.mindir \
        --cls_model_path /path/to/mindir/crnn \
        --rec_model_path /path/to/mindir/crnn_resnet34.mindir \
        --character_dict_path /path/to/ppocr_keys_v1.txt \
        --res_save_dir det_cls_rec

    The results will be saved in det_cls_rec/pipeline_results.txt, with the following format:

    img_478.jpg	[{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
    
  • detection + recognition

    If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.

    ./dist/infer \
        --input_images_dir /path/to/images \
        --backend lite \
        --det_model_path /path/to/mindir/dbnet_resnet50.mindir \
        --rec_model_path /path/to/mindir/crnn_resnet34.mindir \
        --character_dict_path /path/to/ppocr_keys_v1.txt \
        --res_save_dir det_rec

    The results will be saved in det_rec/pipeline_results.txt, with the following format:

    img_478.jpg	[{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
    
  • detection

    Run text detection alone.

    ./dist/infer \
        --input_images_dir /path/to/images \
        --backend lite \
        --det_model_path /path/to/mindir/dbnet_resnet50.mindir \
        --res_save_dir det

    The results will be saved in det/det_results.txt, with the following format:

    img_478.jpg    [[[1114, 35], [1200, 0], [1234, 52], [1148, 97]], [...]]]
    
  • classification

    Run text angle classification alone.

    ./dist/infer \
        --input_images_dir /path/to/images \
        --backend lite \
        --cls_model_path /path/to/mindir/crnn \
        --res_save_dir cls

    The results will be saved in cls/cls_results.txt, with the following format:

    word_867.png   ["180", 0.5176]
    word_1679.png  ["180", 0.6226]
    word_1189.png  ["0", 0.9360]
    

5.2 Detail of inference parameter

  • Basic settings

    name type default description
    input_images_dir str None Image or folder path for inference
    device str Ascend Device type, support Ascend
    device_id int 0 Device id
    backend str acl Inference backend, support acl, lite
    parallel_num int 1 Number of parallel in each stage of pipeline parallelism
  • Saving Result

    name type default description
    res_save_dir str inference_results Saving dir for inference results
  • Text detection

    name type default description
    det_model_path str None Model path for text detection
  • Text angle classification

    name type default description
    cls_model_path str None Model path for text angle classification
  • Text recognition

    name type default description
    rec_model_path str None Model path for text recognition
    rec_config_path str None Config file for text recognition
    character_dict_path str None Dict file for text recognition,default only supports numbers and lowercase