The PDF file extracted many black images, but these black images are not actually present in the PDF file. #30963
Replies: 1 comment 2 replies
-
|
Dify currently extracts all images from PDFs automatically using the pypdfium2 library, and there is no built-in option or environment variable to disable this behavior in v1.11.3. The black images you're seeing are likely artifacts from the extraction process, not actual images in your PDF see implementation. To prevent image extraction, you would need to modify the code. In # image_content = self._extract_images(page)
# if image_content:
# content += "\n" + image_contentThis will skip image extraction and only extract text from PDFs. If you want a more flexible solution, you could add a flag to the Let me know if you need step-by-step guidance for this code change or want help making it configurable. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
The PDF file extracted many black images, but these black images are not actually present in the PDF file.
How to turn off automatic extraction of image content from PDFs ?
dify version : 1.11.3
2. Additional context or comments
No response
Beta Was this translation helpful? Give feedback.
All reactions