You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
do you know id your PDF document is programmatic or scanned? in the first case, we are just updating the set of supported fonts in the parser, and it might improve soon.
We also found that sometime, the previous parser might work better (rare cases). You could give it a try with
I solved part of the problem. When your PDF is encrypted, you need to set some parameters such as: pipeline_options.ocr_options.lang = ["chi_sim"] pipeline_options.ocr_options.force_full_page_ocr = True
Like this, you can get better markdown, but it is still not satisfactory. In this regard, English conversion is obviously better.
Question
I got a chinese pdf, i want to convert to txt file ,but i got garbled. I already set lang=['en','ch_sim'] in EasyOcrOptions.
MY code is
docling 2.15.1
docling-core 2.14.0
docling-ibm-models 3.1.2
docling-parse 3.0.0
The text was updated successfully, but these errors were encountered: