[Bug]: #1477

Silgrond · 2025-02-10T07:28:09Z

Describe the bug

ocrmypdf -l jpn_vert input.pdf ocroutput.pdf ✔

Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/26 -:--:--
An exception occurred while executing the pipeline _common.py:296
Traceback (most recent call last):
File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/_common.py", line 261, in
cli_exception_handler
return fn(options, plugin_manager)
File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py", line 174, in
_run_pipeline
pdfinfo = do_get_pdfinfo(origin_pdf, executor, options)
File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/_common.py", line 318, in
do_get_pdfinfo
return get_pdfinfo(
pdf_path,
...<5 lines>...
check_pages=options.pages,
)
File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipeline.py", line 199, in
get_pdfinfo
return PdfInfo(
input_file,
...<5 lines>...
executor=executor,
)
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 1179, in
init
self._pages = _pdf_pageinfo_concurrent(
~~~~~~~~~~~~~~~~~~~~~~~~^
pdf,
^^^^
...<7 lines>...
miner_state=miner_state,
^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 821, in
_pdf_pageinfo_concurrent
executor(
~~~~~~~~^
use_threads=use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^
...<12 lines>...
task_finished=update_pageinfo,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/ocrmypdf/_concurrent.py", line 78, in call
self._execute(
~~~~~~~~~~~~~^
use_threads=use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
task_finished=task_finished,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line
144, in _execute
result = future.result()
File "/usr/lib/python3.13/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 766, in
_pdf_pageinfo_sync
return PageInfo(
pdf, pageno, infile, check_pages, detailed_analysis, miner_state
)
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 886, in
init
self._gather_pageinfo(
~~~~~~~~~~~~~~~~~~~~~^
pdf, pageno, infile, check_pages, detailed_analysis, miner_state
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 941, in
_gather_pageinfo
for info in _process_content_streams(
~~~~~~~~~~~~~~~~~~~~~~~~^
pdf=pdf, container=page, shorthand=userunit_shorthand
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 669, in
_process_content_streams
contentsinfo = _interpret_contents(container, initial_shorthand)
File "/usr/lib/python3.13/site-packages/ocrmypdf/pdfinfo/info.py", line 229, in
_interpret_contents
_normalize_stack(parse_content_stream(contentstream, operator_whitelist))
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/pikepdf/models/_content_stream.py", line 106,
in parse_content_stream
page._parse_page_contents_grouped(operators),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
ValueError: overflow/underflow converting
-2157231713616891516413098094427726015594901769862730602100741084491831945181808952318889
73205445516242738206423723380283420801456828955611720528865654388281399315331778890329660
58780819344078419605516764288528517056790776940003197080411964201361804775065144566391317
408998138373203171479264026313950639423488 to 64-bit integer

Steps to reproduce

1. Run ocrmypdf -v1 ...arguments... input.pdf output.pdf
2. Open output.pdf
3. ...

Files

No response

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.), source build

OCRmyPDF version

16.9.0

Relevant log output

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2025-02-10T08:34:38Z

I cannot investigate this without a reproducing example.

Silgrond added the triage Issue needs triage label Feb 10, 2025

Silgrond assigned jbarlow83 Feb 10, 2025

jbarlow83 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2025

github-actions bot removed the triage Issue needs triage label Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: #1477

[Bug]: #1477

Silgrond commented Feb 10, 2025

jbarlow83 commented Feb 10, 2025

[Bug]: #1477

[Bug]: #1477

Comments

Silgrond commented Feb 10, 2025

Describe the bug

Steps to reproduce

Files

How did you download and install the software?

OCRmyPDF version

Relevant log output

jbarlow83 commented Feb 10, 2025