Skip to content

Commit a5ff8ac

Browse files
committedJun 13, 2024
update paddleocr to 2.8+ and add layout score output
1 parent f80560f commit a5ff8ac

File tree

2 files changed

+5
-2
lines changed

2 files changed

+5
-2
lines changed
 

‎magic_pdf/model/doc_analyze_by_pp_structurev2.py‎

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,10 @@ def doc_analyze(pdf_bytes: bytes, ocr: bool = False, show_log: bool = False):
9090
line['category_id'] = 2
9191
else:
9292
logger.warning(f"unknown type: {line['type']}")
93-
line['score'] = 0.5 + random.random() * 0.5
93+
94+
# 兼容不输出score的paddleocr版本
95+
if line.get("score") is None:
96+
line['score'] = 0.5 + random.random() * 0.5
9497

9598
res = line.pop('res', None)
9699
if res is not None and len(res) > 0:

‎requirements.txt‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ nltk==3.8.1
1616
s3pathlib>=2.1.1
1717
pytest
1818
paddlepaddle
19-
paddleocr>=2.6.0.3
19+
paddleocr @ https://github.com/myhloli/PaddleOCR/releases/download/paddleocr-2.8.2-released/paddleocr-2.8.2-py3-none-any.whl

0 commit comments

Comments
 (0)
Please sign in to comment.