Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

碰到横版的pdf,解析效果不好,图片和文字的排版错乱了 #465

Closed
CyfZsj opened this issue Aug 20, 2024 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@CyfZsj
Copy link

CyfZsj commented Aug 20, 2024

Description of the bug | 错误描述

解析文件:说明书.pdf
当pdf中有一页很宽时,解析有些问题:
1、解析出现图片和文字排版错位
2、有一些图片和文字没有识别到
解析结果(部分):
layout.pdf
spans.pdf

How to reproduce the bug | 如何复现

执行 magic-pdf -p ....... -o ......... -m auto

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.7.x

Device mode | 设备模式

cuda

@CyfZsj CyfZsj added the bug Something isn't working label Aug 20, 2024
@myhloli
Copy link
Collaborator

myhloli commented Jan 22, 2025

训练数据里不包含这种超宽的页面格式,如需准确识别,请先自行将文档切成约4:3的单页组成的文档并按页面排好顺序。

@myhloli myhloli closed this as completed Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants