Skip to content

PDF to MD Conversion with Docling v2.18 is Incomprehensible #888

@sallahbaksh

Description

@sallahbaksh

Bug

It looks like there was a bug introduced when converting from PDF to Markdown using docling v2.18. The resulting md file is pretty inaccurate.

Steps to reproduce

Use DocumentConverter() to convert pdf to md (Stafford County - VA Zoning Ordinance.pdf). The resulting md file looks like this:
Image

Docling version

Docling version: 2.18.0
Docling Core version: 2.17.1
Docling IBM Models version: 3.3.0
Docling Parse version: 3.2.0
Python: cpython-312 (3.12.8)
Platform: Windows-11-10.0.22631-SP0
...

Python version

Python 3.12.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpdf parsingPDF issue related to docling-parse

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions