New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于pipe_ocr_mode的疑问 #1585

Closed

toyn0015 opened this issue Jan 20, 2025 · 1 comment

toyn0015 commented Jan 20, 2025

ds.apply(doc_analyze, ocr=True).pipe_ocr_mode(image_writer).dump_md(md_writer, f"{name_without_suff}.md", image_dir)

步骤1：.apply(doc_analyze, ocr=True)
步骤2：pipe_ocr_mode(image_writer)
步骤3：dump_md(md_writer, f"{name_without_suff}.md", image_dir)

想咨询一下，在这串代码中 image_writer 这个参数的作用：
1、image_writer 是否会向目标目录里写入文件？
2、image_writer 这个参数，存在的目的是什么？

之所以有这样的疑问，是我在测试中发现，如果没有步骤3，就不会有任何结果文件保存到目标目录，例如：

ds.apply(doc_analyze, ocr=True).pipe_ocr_mode(image_writer)

这样就不会有任何文件写入到目标目录。

Collaborator

myhloli commented Jan 20, 2025

第二步需要imagewriter是为了截图，你第二步没输出是因为没有图片需要截，第三步是为了导出markdown需要一个writer对象，就直接复用了之前的imagewriter.

toyn0015 closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment