OCR-Toolkit

A cute toolkit for OCR, including image preprocessing and text recognition. Works out of the box.

一只小小的OCR工具箱，包括图像预处理和文字识别等功能，开箱即用。

Download

The exe file can be downloaded: OCR Toolkit 2023.03.02 new

1. Preprocessing

1.1 Binary

Denoise the image with Binarization Thresholding.

对图像进行基于阈值分割的二值化，简单去噪。

1.2 Split

Detect the middle line with Hough transform algorithm and segment the image into two parts. It might come in handy when handling documents like dictionary.

通过霍夫变换检测中间界栏，根据界栏对图像进行分割，适用于词典等版式的文档。

2. OCR

2.1 Offline: PaddleOCR

Use PaddleOCR models to get the result of OCR.
No KEY is needed. The result will be saved as a structured csv file.
在本地部署PaddleOCR模型，对图像进行OCR，并将结果存储为结构化的csv文件。

2.2 Online: Baidu API

Use api of Baidu AI to get the result of OCR and parse it. The result will be saved as a structured csv file.
Users need to provide the API_KEY and SECRET_KEY.
More APIs are going to be included.

使用Baidu AI高精度文字识别的API接口，对图像进行OCR，并将结果存储为结构化的csv文件。
用户需自行输入API_KEY和SECRET_KEY。
更多接口扩充中。

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
PaddleOCR		PaddleOCR
examples		examples
models		models
ppocr/utils		ppocr/utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
elephant.icns		elephant.icns
elephant.ico		elephant.ico
layout_table.py		layout_table.py
load_files.py		load_files.py
main.py		main.py
ocr_baidu.py		ocr_baidu.py
ocr_paddle.py		ocr_paddle.py
pre_binary.py		pre_binary.py
pre_black_edge.py		pre_black_edge.py
pre_denoise.py		pre_denoise.py
pre_rotate.py		pre_rotate.py
pre_skew.py		pre_skew.py
pre_split.py		pre_split.py
pre_split_without_line.py		pre_split_without_line.py
pre_upper_line_removal.py		pre_upper_line_removal.py
pre_watermark_removal.py		pre_watermark_removal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-Toolkit

Download

1. Preprocessing

1.1 Binary

1.2 Split

2. OCR

2.1 Offline: PaddleOCR

2.2 Online: Baidu API

About

Releases

Packages

Languages

License

cbdb-project/OCR-Toolkit

Folders and files

Latest commit

History

Repository files navigation

OCR-Toolkit

Download

1. Preprocessing

1.1 Binary

1.2 Split

2. OCR

2.1 Offline: PaddleOCR

2.2 Online: Baidu API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages