class magic_pdf.data.dataset.Doc(doc: Page) 关于 doc 参数 Page 类型的疑惑 #1583

toyn0015 · 2025-01-20T09:08:46Z

Description of the bug | 错误描述

我在尝试学习 class magic_pdf.data.dataset.Doc(doc: Page) 这个类的功能，但是官方文档中并没有描述有关它和它的父类 class magic_pdf.data.dataset.PageableData 的说明。

通过 read_api 读取文件会直接生成文档文件的数据集对象（PymuDocDataset 或 ImageDataset）
通过 IO 或 Data Reader Writer 读取文件会得到文件的二进制数据

无论以上哪个结果，所得到的，都不是 Page 类型的对象。

我尝试过，将 PymuDocDataset 对象用 get_page() 函数获取分页数据，传递给 class magic_pdf.data.dataset.Doc(doc: Page)，这样做好像没什么问题，但是 PyCharm，会提示问题 应为类型 'Page'，但实际为 'PymuDocDataset

请问，这是这个 Page 类型的对象，是从什么地方来的？

How to reproduce the bug | 如何复现

from magic_pdf.data.read_api import read_local_office
from magic_pdf.data.dataset import Doc

file = r'e:\桌面\新建文件夹\test.doc'
ds = read_local_office(file)


something = Doc(ds[0].get_page(0))

print(something)

PyCharm提示内容：

应为类型 'Page'，但实际为 'PageableData'

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cpu

The text was updated successfully, but these errors were encountered:

myhloli · 2025-01-20T09:45:45Z

https://pymupdf.readthedocs.io/en/latest/page.html#page

toyn0015 added the bug Something isn't working label Jan 20, 2025

toyn0015 closed this as completed Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

class magic_pdf.data.dataset.Doc(doc: Page) 关于 doc 参数 Page 类型的疑惑 #1583

class magic_pdf.data.dataset.Doc(doc: Page) 关于 doc 参数 Page 类型的疑惑 #1583

toyn0015 commented Jan 20, 2025

myhloli commented Jan 20, 2025

class magic_pdf.data.dataset.Doc(doc: Page) 关于 doc 参数 Page 类型的疑惑 #1583

class magic_pdf.data.dataset.Doc(doc: Page) 关于 doc 参数 Page 类型的疑惑 #1583

Comments

toyn0015 commented Jan 20, 2025

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式

myhloli commented Jan 20, 2025