Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class magic_pdf.data.dataset.Doc(doc: Page) 关于 doc 参数 Page 类型的疑惑 #1583

Closed
toyn0015 opened this issue Jan 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@toyn0015
Copy link

Description of the bug | 错误描述

我在尝试学习 class magic_pdf.data.dataset.Doc(doc: Page) 这个类的功能,但是官方文档中并没有描述有关它和它的父类 class magic_pdf.data.dataset.PageableData 的说明。

通过 read_api 读取文件会直接生成文档文件的数据集对象(PymuDocDatasetImageDataset
通过 IOData Reader Writer 读取文件会得到文件的二进制数据

无论以上哪个结果,所得到的,都不是 Page 类型的对象。

我尝试过,将 PymuDocDataset 对象用 get_page() 函数获取分页数据,传递给 class magic_pdf.data.dataset.Doc(doc: Page),这样做好像没什么问题,但是 PyCharm,会提示问题 应为类型 'Page',但实际为 'PymuDocDataset

请问,这是这个 Page 类型的对象,是从什么地方来的?

How to reproduce the bug | 如何复现

from magic_pdf.data.read_api import read_local_office
from magic_pdf.data.dataset import Doc

file = r'e:\桌面\新建文件夹\test.doc'
ds = read_local_office(file)


something = Doc(ds[0].get_page(0))

print(something)

PyCharm提示内容:

应为类型 'Page'但实际为 'PageableData'

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cpu

@toyn0015 toyn0015 added the bug Something isn't working label Jan 20, 2025
@myhloli
Copy link
Collaborator

myhloli commented Jan 20, 2025

https://pymupdf.readthedocs.io/en/latest/page.html#page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants