-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Update config & doc for paddleocr_vl #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
forBlank
wants to merge
49
commits into
PaddlePaddle:develop
Choose a base branch
from
forBlank:ocr_vl
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 48 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
29da03c
image preprocessor for ocr
forBlank bbf8f81
modeling ppocrvl
forBlank a37d543
modeling siglip for ocr
forBlank 88463b9
ocr workflow & only DP dataflow
forBlank 3586730
ocr 8k training yaml
forBlank f7ca4e2
update sigliip & GELUTanh & format correction
forBlank 114a20c
update dataflow
forBlank 755519a
constant learning rate scheduler
forBlank 1e2f709
cross entropy by hand
forBlank 7a5b260
update ocr 8k training yaml
forBlank 4c94b84
make lint & remove unused code
forBlank b5bd996
update ocr 8k training yaml
forBlank 4f81675
update image augmentation
forBlank d867546
update config
forBlank 697ee0a
make lint & remove unused code
forBlank 3bc4696
remove tp&sp & no fused mlp&attn for hf ckpt
forBlank 713aee6
support padding&packing_size setting & freeze vit
forBlank 524da01
Unified model name
forBlank 89447d7
update doc & requirement
forBlank 066b62d
update doc
forBlank e74c0c7
update doc for paddleocr_vl_sft
forBlank 6ff11f1
use "PaddleOCR-VL-0.9B" for paddleocr_vl_sft doc
forBlank a044f4e
update packing setting
forBlank 5cb1282
update Bengali dataset for paddleocr_vl_sft doc
forBlank 07afabf
Merge remote-tracking branch 'upstream/develop' into ocr_vl
forBlank b9fe7bf
update Bengali dataset link for paddleocr doc&yaml
forBlank 4ce36a4
support paddleocr vl sft with single GPU
forBlank d915e92
make lint
forBlank 3680e0a
fix paddleocr_vl_sft with single GPU & update doc
forBlank a9257c8
Merge branch 'develop' into ocr_vl
forBlank 9b60be4
update cli for downloading hf model
forBlank 887cd1f
Merge remote-tracking branch 'origin/ocr_vl' into ocr_vl
forBlank 2780691
update cli model path for paddleocr_vl_sft
forBlank ca45067
Update the paddleocr_vl config for single GPU
forBlank 5fe0560
Update the paddleocr_vl config & doc
forBlank 6f18125
Merge remote-tracking branch 'upstream/develop' into ocr_vl
forBlank 479974e
Update the paddleocr_vl config & doc
forBlank c93e078
Update the paddleocr_vl doc
forBlank 55f01d9
Update the paddleocr_vl doc
forBlank 448235b
Merge remote-tracking branch 'upstream/develop' into ocr_vl
forBlank b73e20b
Fix typo errors in paddleocr_vl doc
forBlank 5e09651
Merge remote-tracking branch 'upstream/develop' into ocr_vl
forBlank 34241e8
Update the paddleocr_vl doc for single GPU
forBlank 9d48804
Update the paddleocr_vl doc
forBlank 115a9ad
Update the huggingface line in paddleocr_vl doc
forBlank f842f09
update config for paddleocr_vl
forBlank 3ab7f39
Merge remote-tracking branch 'upstream/develop' into ocr_vl
forBlank 78dbab8
Update markdown links in paddleocr_vl doc
forBlank 1d9a30d
Update links of modelcard in paddleocr_vl doc
forBlank File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -7,21 +7,14 @@ PaddleOCR-VL 是一款为文档解析任务量身打造的、性能顶尖 (SOTA) | |||||
|
|
||||||
| 这款模型不仅能高效支持 109 种语言,还擅长识别文本、表格、公式、图表等复杂元素,并始终保持极低的资源占用。在多个权威的公开及内部基准测试中,PaddleOCR-VL 的页面级文档解析与元素级识别性能均达到了业界顶尖水平。其性能远超现有方案,面对顶级视觉语言模型也极具竞争力,且推理速度飞快。这些杰出特性使其成为在真实场景中落地部署的理想选择。 | ||||||
|
|
||||||
| 虽然 PaddleOCR-VL-0.9B 在常见场景下表现出色,但在许多特定或复杂的业务场景中,其性能会遇到瓶颈。例如: | ||||||
| - 特定行业与专业领域 | ||||||
| - 金融与财会领域:识别发票、收据、银行对账单、财务报表等 | ||||||
| - 医疗领域:识别病历、化验单、医生手写处方、药品说明书等 | ||||||
| - 法律领域:识别合同、法律文书、法庭文件、证书等 | ||||||
|
|
||||||
| - 非标准化的文本与字体 | ||||||
| - 手写体识别:识别手写的表单、笔记、信件、问卷调查等 | ||||||
| - 艺术字体与设计字体:识别海报、广告牌、产品包装、菜单上的艺术字体等 | ||||||
| - 古籍与历史文献:识别古代手稿、旧报纸、历史档案等 | ||||||
|
|
||||||
| 虽然 PaddleOCR-VL-0.9B 在常见场景下表现出色,但在特定或复杂的业务场景中,其识别效果可能会遇到瓶颈。例如: | ||||||
| - 非标准化的文本与符号 | ||||||
| - 艺术设计字体:识别海报、广告牌、产品包装、卡证、印章字体等 | ||||||
| - 特殊符号:有机化学符号识别 | ||||||
| - 特定任务与输出格式 | ||||||
| - 表格识别与结构化输出:将图像中的表格转换为 Excel、CSV 或 JSON 格式 | ||||||
| - 数学公式识别:识别教科书、论文中的数学公式,并输出为 LaTeX 等格式 | ||||||
|
|
||||||
| - 细粒度文本定位和 Grounding 输出 | ||||||
| - 流程图识别和结构化输出 | ||||||
| - 特定的小语种数据:藏语、孟加拉语…… | ||||||
|
|
||||||
| 这时,就需要通过 SFT (Supervised Fine-Tuning) 来提升模型的准确性和鲁棒性。 | ||||||
|
|
||||||
|
|
@@ -58,7 +51,7 @@ python -m pip install numpy==1.26.4 | |||||
|
|
||||||
| ### 3.1. 模型准备 | ||||||
|
|
||||||
| 在 [huggingface](https://huggingface.co/PaddlePaddle/PaddleOCR-VL/tree/main/PaddleOCR-VL-0.9B) 或者 [modelscope](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL/files) 可以下载 PaddleOCR-VL-0.9B 模型。 | ||||||
| 在 [huggingface](https://huggingface.co/PaddlePaddle/PaddleOCR-VL/tree/main) 或者 [modelscope](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL/files) 可以下载 PaddleOCR-VL-0.9B 模型。 | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ```bash | ||||||
| huggingface-cli download PaddlePaddle/PaddleOCR-VL --local-dir PaddlePaddle/PaddleOCR-VL | ||||||
|
|
@@ -93,7 +86,7 @@ wget https://paddleformers.bj.bcebos.com/datasets/ocr_vl_sft-train_Bengali.jsonl | |||||
| ```json | ||||||
| { | ||||||
| "image_info": [ | ||||||
| {"matched_text_index": 0, "image_url": "./assets/table_example.jps"}, | ||||||
| {"matched_text_index": 0, "image_url": "./assets/bengali_train_example.png"}, | ||||||
| ], | ||||||
| "text_info": [ | ||||||
| {"text": "OCR:", "tag": "mask"}, | ||||||
|
|
@@ -237,7 +230,7 @@ paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/PPOCRVL/datas | |||||
| ```json | ||||||
| { | ||||||
| "image_info": [ | ||||||
| {"matched_text_index": 0, "image_url": "./assets/table_example.jps"}, | ||||||
| {"matched_text_index": 0, "image_url": "./assets/table_example.png"}, | ||||||
| ], | ||||||
| "text_info": [ | ||||||
| {"text": "Table Recognition:", "tag": "mask"}, | ||||||
|
|
@@ -255,7 +248,7 @@ paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/PPOCRVL/datas | |||||
| ```json | ||||||
| { | ||||||
| "image_info": [ | ||||||
| {"matched_text_index": 0, "image_url": "./assets/formula_example.jps"}, | ||||||
| {"matched_text_index": 0, "image_url": "./assets/formula_example.jpg"}, | ||||||
| ], | ||||||
| "text_info": [ | ||||||
| {"text": "Formula Recognition:", "tag": "mask"}, | ||||||
|
|
@@ -282,7 +275,7 @@ paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/PPOCRVL/datas | |||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ### 常见问题 | ||||||
| ### 8.2. 常见问题 | ||||||
|
|
||||||
| 如果你使用上述命令过程中遇到下面的问题,一般是因为cv2和环境的冲突,可以通过安装 `opencv-python-headless` 来解决问题 | ||||||
|
|
||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.