paddle错误 #952

zhangtianhong-1998 · 2024-11-14T02:46:27Z

Description of the bug | 错误描述

[11/14 10:44:21 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /home/gaoyuan/.cache/modelscope/hub/opendatalab/PDF-Extract-Kit-1___0/models/Layout/LayoutLMv3/model_final.pth ...
[11/14 10:44:21 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/gaoyuan/.cache/modelscope/hub/opendatalab/PDF-Extract-Kit-1___0/models/Layout/LayoutLMv3/model_final.pth ...
[libprotobuf ERROR /paddle/third_party/protobuf/src/google/protobuf/message_lite.cc:133] Can't parse message of type "paddle.framework.proto.VarType.TensorDesc" because it is missing required fields: data_type
2024-11-14 10:44:22.074 | ERROR | main::25 - (InvalidArgument) Cannot parse tensor desc
[Hint: Expected desc.ParseFromArray(buf.get(), size) == true, but received desc.ParseFromArray(buf.get(), size):0 != true:1.] (at /paddle/paddle/fluid/framework/tensor_util.cc:689)
[operator < load_combine > error]
Traceback (most recent call last):

File "/data/gaoyuan/project_3/demo.py", line 19, in
pipe.pipe_analyze()
│ └ <function UNIPipe.pipe_analyze at 0x7f7c20119630>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7f7d7735fc40>

File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 37, in pipe_analyze
self.model_list = doc_analyze(self.pdf_bytes, ocr=True,
│ │ │ │ └ b'%PDF-1.3\r%\xe2\xe3\xcf\xd3\r\n314 0 obj\r<< \r/Linearized 1 \r/O 316 \r/H [ 1374 457 ] \r/L 1310264 \r/E 161639 \r/N 7 \r/...
│ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7f7d7735fc40>
│ │ └ <function doc_analyze at 0x7f7c21691240>
│ └ []
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7f7d7735fc40>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 147, in doc_analyze
custom_model = model_manager.get_model(ocr, show_log, lang, layout_model, formula_enable, table_enable)
│ │ │ │ │ │ │ └ None
│ │ │ │ │ │ └ None
│ │ │ │ │ └ None
│ │ │ │ └ None
│ │ │ └ False
│ │ └ True
│ └ <function ModelSingleton.get_model at 0x7f7c216911b0>
└ <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7f7c2125fb80>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 75, in get_model
self._models[key] = custom_model_init(ocr=ocr, show_log=show_log, lang=lang, layout_model=layout_model,
│ │ │ │ │ │ │ └ None
│ │ │ │ │ │ └ None
│ │ │ │ │ └ False
│ │ │ │ └ True
│ │ │ └ <function custom_model_init at 0x7f7c21691090>
│ │ └ (True, False, None, None, None, None)
│ └ {}
└ <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7f7c2125fb80>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 126, in custom_model_init
custom_model = CustomPEKModel(**model_input)
│ └ {'ocr': True, 'show_log': False, 'models_dir': '/home/gaoyuan/.cache/modelscope/hub/opendatalab/PDF-Extract-Kit-1___0/models'...
└ <class 'magic_pdf.model.pdf_extract_kit.CustomPEKModel'>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 285, in init
self.ocr_model = atom_model_manager.get_atom_model(
│ │ └ <function AtomModelSingleton.get_atom_model at 0x7f7bcd0d9a20>
│ └ <magic_pdf.model.pdf_extract_kit.AtomModelSingleton object at 0x7f7d7735fd90>
└ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f7c2125fc70>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 131, in get_atom_model
self._models[key] = atom_model_init(model_name=atom_model_name, **kwargs)
│ │ │ │ │ └ {'ocr_show_log': False, 'det_db_box_thresh': 0.3, 'lang': None}
│ │ │ │ └ 'ocr'
│ │ │ └ <function atom_model_init at 0x7f7bcd0d9750>
│ │ └ ('ocr', None, None)
│ └ {('mfd', None, None): YOLO(
│ (model): DetectionModel(
│ (model): Sequential(
│ (0): Conv(
│ (conv): Conv2d(3, 64...
└ <magic_pdf.model.pdf_extract_kit.AtomModelSingleton object at 0x7f7d7735fd90>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 159, in atom_model_init
atom_model = ocr_model_init(
└ <function ocr_model_init at 0x7f7bcd0d96c0>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 94, in ocr_model_init
model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=det_db_box_thresh, use_dilation=use_dilation, det_db_unclip_ratio=det_db_unclip_ratio)
│ │ │ │ └ 1.8
│ │ │ └ True
│ │ └ 0.3
│ └ False
└ <class 'magic_pdf.model.pek_sub_modules.self_modify.ModifiedPaddleOCR'>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddleocr/paddleocr.py", line 616, in init
super().init(params)
└ Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_s...
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddleocr/tools/infer/predict_system.py", line 47, in init
self.text_recognizer = predict_rec.TextRecognizer(args)
│ │ │ └ Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_s...
│ │ └ <class 'tools.infer.predict_rec.TextRecognizer'>
│ └ <module 'tools.infer.predict_rec' from '/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddleocr/tools/...
└ <magic_pdf.model.pek_sub_modules.self_modify.ModifiedPaddleOCR object at 0x7f7b16c47be0>
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddleocr/tools/infer/predict_rec.py", line 127, in init
utility.create_predictor(args, 'rec', logger)
│ │ │ └ <Logger ppocr (INFO)>
│ │ └ Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_s...
│ └ <function create_predictor at 0x7f7bd4318550>
└ <module 'tools.infer.utility' from '/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddleocr/tools/infe...
File "/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddleocr/tools/infer/utility.py", line 280, in create_predictor
predictor = inference.create_predictor(config)
│ │ └ <paddle.base.libpaddle.AnalysisConfig object at 0x7f7bcf40e0f0>
│ └ <built-in method create_predictor of PyCapsule object at 0x7f7bed511f20>
└ <module 'paddle.inference' from '/home/gaoyuan/miniconda3/envs/project_1/lib/python3.10/site-packages/paddle/inference/__init...

ValueError: (InvalidArgument) Cannot parse tensor desc
[Hint: Expected desc.ParseFromArray(buf.get(), size) == true, but received desc.ParseFromArray(buf.get(), size):0 != true:1.] (at /paddle/paddle/fluid/framework/tensor_util.cc:689)
[operator < load_combine > error]

How to reproduce the bug | 如何复现

在图形界面的ubuntu没问题，但是在服务器的虚拟环境就开始报错
CUDA 11.7
Package Version

absl-py 2.1.0
accelerate 1.1.1
aiohappyeyeballs 2.4.3
aiohttp 3.11.0
aiosignal 1.3.1
albucore 0.0.20
albumentations 1.4.21
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyio 4.6.2.post1
astor 0.8.1
async-timeout 5.0.1
attrdict 2.0.1
attrs 24.2.0
babel 2.16.0
bce-python-sdk 0.9.23
beautifulsoup4 4.12.3
black 24.10.0
blinker 1.9.0
boto3 1.35.60
botocore 1.35.60
braceexpand 0.1.7
Brotli 1.1.0
cachetools 5.5.0
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.4.0
click 8.1.7
cloudpickle 3.1.0
colorlog 6.9.0
contourpy 1.3.1
cryptography 43.0.3
cssselect 1.2.0
cssutils 2.11.1
cycler 0.12.1
Cython 3.0.11
datasets 3.1.0
decorator 5.1.1
detectron2 0.6
dill 0.3.8
doclayout_yolo 0.0.2
einops 0.8.0
et_xmlfile 2.0.0
eva-decord 0.6.1
eval_type_backport 0.2.0
evaluate 0.4.3
exceptiongroup 1.2.2
fairscale 0.4.13
fast-langdetect 0.2.0
fasttext-wheel 0.9.2
filelock 3.16.1
fire 0.7.0
Flask 3.1.0
flask-babel 4.0.0
fonttools 4.54.1
frozenlist 1.5.0
fsspec 2024.9.0
ftfy 6.3.1
future 1.0.0
fvcore 0.1.5.post20221221
grpcio 1.67.1
h11 0.14.0
httpcore 1.0.6
httpx 0.27.2
huggingface-hub 0.26.2
hydra-core 1.3.2
idna 3.10
imageio 2.36.0
imgaug 0.4.0
iopath 0.1.9
itsdangerous 2.2.0
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
kiwisolver 1.4.7
langdetect 1.0.9
lazy_loader 0.4
lmdb 1.5.1
loguru 0.7.2
lxml 5.3.0
magic-pdf 0.9.2
Markdown 3.7
MarkupSafe 3.0.2
matplotlib 3.9.2
more-itertools 10.5.0
mpmath 1.3.0
multidict 6.1.0
multiprocess 0.70.16
mypy-extensions 1.0.0
networkx 3.4.2
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.6.77
nvidia-nvtx-cu12 12.1.105
omegaconf 2.3.0
opencv-contrib-python 4.6.0.66
opencv-python 4.6.0.66
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
opt-einsum 3.3.0
packaging 24.2
paddleocr 2.7.3
paddlepaddle 3.0.0b1
pandas 2.2.3
pathspec 0.12.1
pdf2docx 0.5.8
pdfminer.six 20231228
pillow 11.0.0
pip 24.2
platformdirs 4.3.6
portalocker 2.10.1
premailer 3.10.0
propcache 0.2.0
protobuf 5.28.3
psutil 6.1.0
py-cpuinfo 9.0.0
pyarrow 18.0.0
pybind11 2.13.6
pyclipper 1.3.0.post6
pycocotools 2.0.8
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.7.4
pydantic_core 2.18.4
PyMuPDF 1.24.13
pyparsing 3.2.0
python-dateutil 2.9.0.post0
python-docx 1.1.2
pytz 2024.2
PyYAML 6.0.2
RapidFuzz 3.10.1
rarfile 4.2
regex 2024.11.6
requests 2.32.3
robust-downloader 0.0.2
s3transfer 0.10.3
safetensors 0.4.5
scikit-image 0.24.0
scikit-learn 1.5.2
scipy 1.14.1
seaborn 0.13.2
setuptools 75.1.0
shapely 2.0.6
simsimd 6.0.5
six 1.16.0
sniffio 1.3.1
soupsieve 2.6
stringzilla 3.10.10
struct-eqtable 0.3.2
sympy 1.13.3
tabulate 0.9.0
tensorboard 2.18.0
tensorboard-data-server 0.7.2
termcolor 2.5.0
thop 0.1.1.post2209072238
threadpoolctl 3.5.0
tifffile 2024.9.20
timm 0.9.16
tokenizers 0.19.1
tomli 2.1.0
torch 2.3.1
torchtext 0.18.0
torchvision 0.18.1
tqdm 4.67.0
transformers 4.42.4
triton 2.3.1
typing_extensions 4.12.2
tzdata 2024.2
ultralytics 8.3.30
ultralytics-thop 2.0.11
unimernet 0.2.1
urllib3 2.2.3
visualdl 2.5.3
Wand 0.6.13
wcwidth 0.2.13
webdataset 0.2.100
Werkzeug 3.1.3
wheel 0.44.0
xxhash 3.5.0
yacs 0.1.8
yarl 1.17.1

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

myhloli · 2024-11-14T02:52:34Z

可以试试删除用户目录下的.paddleocr目录,重新下载一遍paddleocr的模型再试

zhangtianhong-1998 · 2024-11-14T05:00:53Z

谢谢，确实已经解决

zhangtianhong-1998 added the bug Something isn't working label Nov 14, 2024

zhangtianhong-1998 closed this as completed Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paddle错误 #952

paddle错误 #952

zhangtianhong-1998 commented Nov 14, 2024

myhloli commented Nov 14, 2024

zhangtianhong-1998 commented Nov 14, 2024

paddle错误 #952

paddle错误 #952

Comments

zhangtianhong-1998 commented Nov 14, 2024

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式

myhloli commented Nov 14, 2024

zhangtianhong-1998 commented Nov 14, 2024