-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
能否把模型加载和处理流程拆分开,这样不用重复加载模型,速度能快点 #932
Labels
enhancement
New feature or request
Comments
模型是支持预加载的,可以预先加载,也可以处理的时候再加载 |
感谢回复,我看到这段代码了 def init_model():
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton
try:
model_manager = ModelSingleton()
txt_model = model_manager.get_model(False, False)
logger.info(f"txt_model init final")
ocr_model = model_manager.get_model(True, False)
logger.info(f"ocr_model init final")
return 0
except Exception as e:
logger.exception(e)
return -1
model_init = init_model()
logger.info(f"model_init: {model_init}") 有没有文档说明,get_model里的False, True是什么意思。另外,table的模型,layout怎么加载? import os
from loguru import logger
from magic_pdf.pipe.UNIPipe import UNIPipe
from magic_pdf.rw.DiskReaderWriter import DiskReaderWriter
try:
current_script_dir = os.path.dirname(os.path.abspath(__file__))
demo_name = "demo1"
pdf_path = os.path.join(current_script_dir, f"{demo_name}.pdf")
pdf_bytes = open(pdf_path, "rb").read()
jso_useful_key = {"_pdf_type": "", "model_list": []}
local_image_dir = os.path.join(current_script_dir, 'images')
image_dir = str(os.path.basename(local_image_dir))
image_writer = DiskReaderWriter(local_image_dir)
pipe = UNIPipe(pdf_bytes, jso_useful_key, image_writer)
pipe.pipe_classify()
pipe.pipe_analyze()
pipe.pipe_parse()
md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none")
with open(f"{demo_name}.md", "w", encoding="utf-8") as f:
f.write(md_content)
except Exception as e:
logger.exception(e) 模型预加载之后,和这段处理的pipeline怎么整合? 另外,我在处理不同的文件需要输出不同的目录,这个pipeline预先就定下来了。怎么动态调整? 期待答复,谢谢:) |
在pipeline前的任意时刻,调用模型加载代码就可以了,模型对象是一整个模型包,包含了layout、公式、ocr、table等所有模型 with open(f"{demo_name}.md", "w", encoding="utf-8") as f:
f.write(md_content) |
感谢神速回复。还有几个疑问:
期待答复,谢谢:) |
|
感谢! |
这个方法好像不生效。执行完init,在调识别的时候,才会弹出模型初始化信息。不过如果循环跑,第一遍之后不会再弹这些消息
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
能否把模型加载和处理流程拆分开,这样不用重复加载模型,速度能快点
The text was updated successfully, but these errors were encountered: