Why do files become bigger after optimization without OCR? #1486

homocomputeris · 2025-02-22T14:47:18Z

homocomputeris
Feb 22, 2025

In many cases when I run

ocrmypdf --tesseract-timeout=0 --remove-background --optimize $LEVEL input.pdf output.pdf

where LEVEL is 2 or 3, the files actually become bigger compared even to -o1.

Is it a bug?

jbarlow83 · 2025-02-22T19:17:54Z

jbarlow83
Feb 22, 2025
Maintainer

It could be, all else being equal, but impossible to say without a reproducing test file and command line.

0 replies

homocomputeris · 2025-02-22T21:07:27Z

homocomputeris
Feb 22, 2025
Author

OK, I managed to make a MWE:
test_O3.pdf
test_O2.pdf

Here is the full file generating chain, and -o3 is always larger than -o2:

PAPERSIZE='A4'
LANG='jpn+eng'
TITLE='test'
AUTHOR='test'
name='test'
PARTKEYWORDS='manual'
KEYWORDS="name ${name}; ${PARTKEYWORDS}"

echo "Running img2pdf"
img2pdf -S "${PAPERSIZE}" --title "${TITLE}" --author "${AUTHOR}" --keywords "${KEYWORDS}" -o ./"${name}.pdf" ./out/*.tif

ocrmypdf --output-type pdf --oversample 600 -l "${LANG}" --title "${TITLE}" --author "${AUTHOR}" --keywords "${KEYWORDS}" -O1 --fast-web-view 10 "./"${name}".pdf" "./"${name}_O1".pdf"
ocrmypdf --verbose --skip-text --tesseract-timeout=0 --remove-background --optimize 2 "./"${name}_O1".pdf" "./"${name}_O2".pdf"
ocrmypdf --verbose --skip-text --tesseract-timeout=0 --remove-background --optimize 3 "./"${name}_O2".pdf" "./"${name}_O3".pdf"

eza -l         
drwxr-xr-x     - user 22 Feb 21:30 out
.rwx------@  719 user 22 Feb 21:58 test.ocr.zsh
.rw-r--r--@  47M user 22 Feb 21:58 test.pdf
.rw-r--r--@  47M user 22 Feb 22:00 test_O1.pdf
.rw-r--r--@ 3.5M user 22 Feb 22:00 test_O2.pdf
.rw-r--r--@ 3.6M user 22 Feb 22:00 test_O3.pdf

Verbose output

% zsh ./*.ocr.zsh
Running img2pdf
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Start processing 6 pages concurrently                                  ocr.py:96
    4 [tesseract] lots of diacritics - possibly poor OCR        tesseract.py:241
    3 [tesseract] lots of diacritics - possibly poor OCR        tesseract.py:241
    2 [tesseract] Image too small to scale!! (2x36 vs min width tesseract.py:259
of 3)                                                                           
    2 [tesseract] Line cannot be recognized!!                   tesseract.py:259
    2 [tesseract] Image too small to scale!! (2x36 vs min width tesseract.py:259
of 3)                                                                           
    2 [tesseract] Line cannot be recognized!!                   tesseract.py:259
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Postprocessing...                                                     ocr.py:144
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Image optimization ratio: 1.00 savings: 0.1%                   _pipeline.py:1002
Total file size ratio: 1.00 savings: -0.1%                     _pipeline.py:1005
ocrmypdf 16.9.0                                                   __main__.py:59
Running: ['tesseract', '--version']                              __init__.py:133
Found tesseract 5.5.0                                            __init__.py:345
Running: ['tesseract', '--version']                              __init__.py:133
Running: ['tesseract', '--version']                              __init__.py:133
Running: ['pngquant', '--version']                               __init__.py:133
Found pngquant 3.0.3                                             __init__.py:345
Running: ['jbig2', '--version']                                  __init__.py:133
Found jbig2 0.30                                                 __init__.py:345
Running: ['gs', '--version']                                     __init__.py:133
Found gs 10.4.0                                                  __init__.py:345
Running: ['gs', '--version']                                     __init__.py:133
Running: ['tesseract', '--list-langs']                           __init__.py:133
stdout/stderr = List of available languages in                    __init__.py:73
"/usr/local/share/tessdata/" (163):                                             
afr                                                                             
amh                                                                             
ara                                                                             
asm                                                                             
aze                                                                             
aze_cyrl                                                                        
bel                                                                             
ben                                                                             
bod                                                                             
bos                                                                             
bre                                                                             
bul                                                                             
cat                                                                             
ceb                                                                             
ces                                                                             
chi_sim                                                                         
chi_sim_vert                                                                    
chi_tra                                                                         
chi_tra_vert                                                                    
chr                                                                             
cos                                                                             
cym                                                                             
dan                                                                             
deu                                                                             
div                                                                             
dzo                                                                             
ell                                                                             
eng                                                                             
enm                                                                             
epo                                                                             
equ                                                                             
est                                                                             
eus                                                                             
fao                                                                             
fas                                                                             
fil                                                                             
fin                                                                             
fra                                                                             
frk                                                                             
frm                                                                             
fry                                                                             
gla                                                                             
gle                                                                             
glg                                                                             
grc                                                                             
guj                                                                             
hat                                                                             
heb                                                                             
hin                                                                             
hrv                                                                             
hun                                                                             
hye                                                                             
iku                                                                             
ind                                                                             
isl                                                                             
ita                                                                             
ita_old                                                                         
jav                                                                             
jpn                                                                             
jpn_vert                                                                        
kan                                                                             
kat                                                                             
kat_old                                                                         
kaz                                                                             
khm                                                                             
kir                                                                             
kmr                                                                             
kor                                                                             
kor_vert                                                                        
lao                                                                             
lat                                                                             
lav                                                                             
lit                                                                             
ltz                                                                             
mal                                                                             
mar                                                                             
mkd                                                                             
mlt                                                                             
mon                                                                             
mri                                                                             
msa                                                                             
mya                                                                             
nep                                                                             
nld                                                                             
nor                                                                             
oci                                                                             
ori                                                                             
osd                                                                             
pan                                                                             
pol                                                                             
por                                                                             
pus                                                                             
que                                                                             
ron                                                                             
rus                                                                             
san                                                                             
script/Arabic                                                                   
script/Armenian                                                                 
script/Bengali                                                                  
script/Canadian_Aboriginal                                                      
script/Cherokee                                                                 
script/Cyrillic                                                                 
script/Devanagari                                                               
script/Ethiopic                                                                 
script/Fraktur                                                                  
script/Georgian                                                                 
script/Greek                                                                    
script/Gujarati                                                                 
script/Gurmukhi                                                                 
script/HanS                                                                     
script/HanS_vert                                                                
script/HanT                                                                     
script/HanT_vert                                                                
script/Hangul                                                                   
script/Hangul_vert                                                              
script/Hebrew                                                                   
script/Japanese                                                                 
script/Japanese_vert                                                            
script/Kannada                                                                  
script/Khmer                                                                    
script/Lao                                                                      
script/Latin                                                                    
script/Malayalam                                                                
script/Myanmar                                                                  
script/Oriya                                                                    
script/Sinhala                                                                  
script/Syriac                                                                   
script/Tamil                                                                    
script/Telugu                                                                   
script/Thaana                                                                   
script/Thai                                                                     
script/Tibetan                                                                  
script/Vietnamese                                                               
sin                                                                             
slk                                                                             
slv                                                                             
snd                                                                             
snum                                                                            
spa                                                                             
spa_old                                                                         
sqi                                                                             
srp                                                                             
srp_latn                                                                        
sun                                                                             
swa                                                                             
swe                                                                             
syr                                                                             
tam                                                                             
tat                                                                             
tel                                                                             
tgk                                                                             
tha                                                                             
tir                                                                             
ton                                                                             
tur                                                                             
uig                                                                             
ukr                                                                             
urd                                                                             
uzb                                                                             
uzb_cyrl                                                                        
vie                                                                             
yid                                                                             
yor                                                                             
                                                                                
pikepdf mmap enabled                                              helpers.py:328
os.symlink(./test_O1.pdf,                                         helpers.py:179
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k_2g               
p6st/origin)                                                                    
Gathering info with 1 thread workers                                 info.py:816
pikepdf mmap enabled                                              helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Using Tesseract OpenMP thread limit 1                       tesseract_ocr.py:199
Start processing 6 pages concurrently                                  ocr.py:96
pikepdf mmap enabled                                              helpers.py:328
pikepdf mmap enabled                                              helpers.py:328
pikepdf mmap enabled                                              helpers.py:328
    1 skipping all processing on this page                      _pipeline.py:343
pikepdf mmap enabled                                              helpers.py:328
pikepdf mmap enabled                                              helpers.py:328
    2 skipping all processing on this page                      _pipeline.py:343
pikepdf mmap enabled                                              helpers.py:328
    3 skipping all processing on this page                      _pipeline.py:343
    4 skipping all processing on this page                      _pipeline.py:343
    1 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    5 skipping all processing on this page                      _pipeline.py:343
    6 skipping all processing on this page                      _pipeline.py:343
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    2 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    2 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    3 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    3 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    4 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    4 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    5 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    5 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    6 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    6 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
Image processing      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Postprocessing...                                                     ocr.py:144
os.symlink(/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmy helpers.py:179
pdf.io.k_2gp6st/graft_layers.pdf,                                               
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k_2g               
p6st/fix_docinfo.pdf)                                                           
Running: ['gs', '--version']                                     __init__.py:133
Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER',               __init__.py:133
'-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite',                                
'-dAutoRotatePages=/None',                                                      
'-sColorConversionStrategy=LeaveColorUnchanged',                                
'-dPDFSTOPONERROR', '-dAutoFilterColorImages=true',                             
'-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2',                        
'-dPDFACompatibilityPolicy=1', '-o',                                            
'/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k_                
2gp6st/pdfa.pdf', '-sstdout=%stderr',                                           
'/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k_                
2gp6st/pdfa.ps',                                                                
'/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k_                
2gp6st/fix_docinfo.pdf']                                                        
GPL Ghostscript 10.04.0 (2024-09-18)                             __init__.py:108
Copyright (C) 2024 Artifex Software, Inc.  All rights reserved.  __init__.py:108
This software is supplied under the GNU AGPLv3 and comes with NO __init__.py:108
WARRANTY:                                                                       
see the file COPYING for details.                                __init__.py:108
Processing pages 1 through 6.                                    __init__.py:108
Page 1                                                           __init__.py:108
Page 2                                                           __init__.py:108
Page 3                                                           __init__.py:108
Page 4                                                           __init__.py:108
Page 5                                                           __init__.py:108
Page 6                                                           __init__.py:108
PDF/A conversion      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Running: ['tesseract', '--version']                              __init__.py:133
Some input metadata could not be copied because it is not        _metadata.py:63
permitted in PDF/A. You may wish to examine the output PDF's XMP                
metadata.                                                                       
The following metadata fields were not copied:                   _metadata.py:68
{'{http://ns.adobe.com/xap/1.0/}MetadataDate'}                                  
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
xref 38: treating as an optimization candidate                   optimize.py:290
xref 40: treating as an optimization candidate                   optimize.py:290
xref 43: treating as an optimization candidate                   optimize.py:290
xref 45: treating as an optimization candidate                   optimize.py:290
xref 48: treating as an optimization candidate                   optimize.py:290
xref 50: treating as an optimization candidate                   optimize.py:290
XrefExt(xref=48, ext='.jpg')                                     optimize.py:355
XrefExt(xref=50, ext='.jpg')                                     optimize.py:355
XrefExt(xref=38, ext='.jpg')                                     optimize.py:355
XrefExt(xref=40, ext='.jpg')                                     optimize.py:355
XrefExt(xref=43, ext='.jpg')                                     optimize.py:355
Optimizable images: JPEGs: 5 PNGs: 0                             optimize.py:360
xref 48, jpeg, made larger - skip                                optimize.py:476
xref 40, jpeg, made larger - skip                                optimize.py:476
xref 43, jpeg, made larger - skip                                optimize.py:476
xref 50, jpeg, made larger - skip                                optimize.py:476
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 5/5 0:00:00
xref 38: treating as an optimization candidate                   optimize.py:290
xref 40: treating as an optimization candidate                   optimize.py:290
xref 43: treating as an optimization candidate                   optimize.py:290
xref 45: treating as an optimization candidate                   optimize.py:290
xref 48: treating as an optimization candidate                   optimize.py:290
xref 50: treating as an optimization candidate                   optimize.py:290
xref 48: marking this JPEG as deflatable                         optimize.py:555
xref 50: marking this JPEG as deflatable                         optimize.py:555
xref 38: marking this JPEG as deflatable                         optimize.py:555
xref 40: marking this JPEG as deflatable                         optimize.py:555
xref 43: marking this JPEG as deflatable                         optimize.py:555
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 5/5 0:00:00
PNGs                  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
xref 38: treating as an optimization candidate                   optimize.py:290
xref 40: treating as an optimization candidate                   optimize.py:290
xref 43: treating as an optimization candidate                   optimize.py:290
xref 45: treating as an optimization candidate                   optimize.py:290
xref 48: treating as an optimization candidate                   optimize.py:290
xref 50: treating as an optimization candidate                   optimize.py:290
xref 48: found image compressed as /FlateDecode /DCTDecode,      optimize.py:103
marked for JPEG optimization                                                    
xref 50: found image compressed as /FlateDecode /DCTDecode,      optimize.py:103
marked for JPEG optimization                                                    
xref 38: found image compressed as /FlateDecode /DCTDecode,      optimize.py:103
marked for JPEG optimization                                                    
xref 40: found image compressed as /FlateDecode /DCTDecode,      optimize.py:103
marked for JPEG optimization                                                    
xref 43: found image compressed as /FlateDecode /DCTDecode,      optimize.py:103
marked for JPEG optimization                                                    
Running: ['jbig2', '--version']                                  __init__.py:133
Optimizable images: JBIG2 groups: 1                              optimize.py:371
Running: ['jbig2', '--pdf', '-t', '0.85',                        __init__.py:133
PosixPath('/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrm                
ypdf.io.k_2gp6st/images/00000045.prejbig2.tif')]                                
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
os.symlink(/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmy helpers.py:179
pdf.io.k_2gp6st/optimize.opt.pdf,                                               
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k_2g               
p6st/optimize.pdf)                                                              
Running: ['jbig2', '--version']                                  __init__.py:133
Running: ['pngquant', '--version']                               __init__.py:133
Image optimization ratio: 1.08 savings: 7.3%                   _pipeline.py:1002
Total file size ratio: 13.38 savings: 92.5%                    _pipeline.py:1005
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.k _pipeline.py:1077
_2gp6st/optimize.pdf -> ./test_O2.pdf                                           
Output file is a PDF/A-2B (as expected)                           _common.py:474
ocrmypdf 16.9.0                                                   __main__.py:59
Running: ['tesseract', '--version']                              __init__.py:133
Found tesseract 5.5.0                                            __init__.py:345
Running: ['tesseract', '--version']                              __init__.py:133
Running: ['tesseract', '--version']                              __init__.py:133
Running: ['pngquant', '--version']                               __init__.py:133
Found pngquant 3.0.3                                             __init__.py:345
Running: ['jbig2', '--version']                                  __init__.py:133
Found jbig2 0.30                                                 __init__.py:345
Running: ['gs', '--version']                                     __init__.py:133
Found gs 10.4.0                                                  __init__.py:345
Running: ['gs', '--version']                                     __init__.py:133
Running: ['tesseract', '--list-langs']                           __init__.py:133
stdout/stderr = List of available languages in                    __init__.py:73
"/usr/local/share/tessdata/" (163):                                             
afr                                                                             
amh                                                                             
ara                                                                             
asm                                                                             
aze                                                                             
aze_cyrl                                                                        
bel                                                                             
ben                                                                             
bod                                                                             
bos                                                                             
bre                                                                             
bul                                                                             
cat                                                                             
ceb                                                                             
ces                                                                             
chi_sim                                                                         
chi_sim_vert                                                                    
chi_tra                                                                         
chi_tra_vert                                                                    
chr                                                                             
cos                                                                             
cym                                                                             
dan                                                                             
deu                                                                             
div                                                                             
dzo                                                                             
ell                                                                             
eng                                                                             
enm                                                                             
epo                                                                             
equ                                                                             
est                                                                             
eus                                                                             
fao                                                                             
fas                                                                             
fil                                                                             
fin                                                                             
fra                                                                             
frk                                                                             
frm                                                                             
fry                                                                             
gla                                                                             
gle                                                                             
glg                                                                             
grc                                                                             
guj                                                                             
hat                                                                             
heb                                                                             
hin                                                                             
hrv                                                                             
hun                                                                             
hye                                                                             
iku                                                                             
ind                                                                             
isl                                                                             
ita                                                                             
ita_old                                                                         
jav                                                                             
jpn                                                                             
jpn_vert                                                                        
kan                                                                             
kat                                                                             
kat_old                                                                         
kaz                                                                             
khm                                                                             
kir                                                                             
kmr                                                                             
kor                                                                             
kor_vert                                                                        
lao                                                                             
lat                                                                             
lav                                                                             
lit                                                                             
ltz                                                                             
mal                                                                             
mar                                                                             
mkd                                                                             
mlt                                                                             
mon                                                                             
mri                                                                             
msa                                                                             
mya                                                                             
nep                                                                             
nld                                                                             
nor                                                                             
oci                                                                             
ori                                                                             
osd                                                                             
pan                                                                             
pol                                                                             
por                                                                             
pus                                                                             
que                                                                             
ron                                                                             
rus                                                                             
san                                                                             
script/Arabic                                                                   
script/Armenian                                                                 
script/Bengali                                                                  
script/Canadian_Aboriginal                                                      
script/Cherokee                                                                 
script/Cyrillic                                                                 
script/Devanagari                                                               
script/Ethiopic                                                                 
script/Fraktur                                                                  
script/Georgian                                                                 
script/Greek                                                                    
script/Gujarati                                                                 
script/Gurmukhi                                                                 
script/HanS                                                                     
script/HanS_vert                                                                
script/HanT                                                                     
script/HanT_vert                                                                
script/Hangul                                                                   
script/Hangul_vert                                                              
script/Hebrew                                                                   
script/Japanese                                                                 
script/Japanese_vert                                                            
script/Kannada                                                                  
script/Khmer                                                                    
script/Lao                                                                      
script/Latin                                                                    
script/Malayalam                                                                
script/Myanmar                                                                  
script/Oriya                                                                    
script/Sinhala                                                                  
script/Syriac                                                                   
script/Tamil                                                                    
script/Telugu                                                                   
script/Thaana                                                                   
script/Thai                                                                     
script/Tibetan                                                                  
script/Vietnamese                                                               
sin                                                                             
slk                                                                             
slv                                                                             
snd                                                                             
snum                                                                            
spa                                                                             
spa_old                                                                         
sqi                                                                             
srp                                                                             
srp_latn                                                                        
sun                                                                             
swa                                                                             
swe                                                                             
syr                                                                             
tam                                                                             
tat                                                                             
tel                                                                             
tgk                                                                             
tha                                                                             
tir                                                                             
ton                                                                             
tur                                                                             
uig                                                                             
ukr                                                                             
urd                                                                             
uzb                                                                             
uzb_cyrl                                                                        
vie                                                                             
yid                                                                             
yor                                                                             
                                                                                
pikepdf mmap enabled                                              helpers.py:328
os.symlink(./test_O2.pdf,                                         helpers.py:179
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.iyiq               
01bi/origin)                                                                    
Gathering info with 1 thread workers                                 info.py:816
pikepdf mmap enabled                                              helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Using Tesseract OpenMP thread limit 1                       tesseract_ocr.py:199
Start processing 6 pages concurrently                                  ocr.py:96
pikepdf mmap enabled                                              helpers.py:328
    1 skipping all processing on this page                      _pipeline.py:343
pikepdf mmap enabled                                              helpers.py:328
pikepdf mmap enabled                                              helpers.py:328
pikepdf mmap enabled                                              helpers.py:328
    2 skipping all processing on this page                      _pipeline.py:343
pikepdf mmap enabled                                              helpers.py:328
    3 skipping all processing on this page                      _pipeline.py:343
pikepdf mmap enabled                                              helpers.py:328
    1 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    4 skipping all processing on this page                      _pipeline.py:343
    5 skipping all processing on this page                      _pipeline.py:343
    6 skipping all processing on this page                      _pipeline.py:343
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    2 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    2 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    3 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    3 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    4 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    4 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    5 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    5 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
    6 Text rotation: (text, autorotate, content) -> text           _graft.py:152
misalignment = (0, 0, 0) -> 0                                                   
    6 Page rotation: (content, auto) -> page = (0, 0) -> 0         _graft.py:177
Image processing      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Postprocessing...                                                     ocr.py:144
os.symlink(/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmy helpers.py:179
pdf.io.iyiq01bi/graft_layers.pdf,                                               
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.iyiq               
01bi/fix_docinfo.pdf)                                                           
Running: ['gs', '--version']                                     __init__.py:133
Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER',               __init__.py:133
'-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite',                                
'-dAutoRotatePages=/None',                                                      
'-sColorConversionStrategy=LeaveColorUnchanged',                                
'-dPDFSTOPONERROR', '-dAutoFilterColorImages=true',                             
'-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2',                        
'-dPDFACompatibilityPolicy=1', '-o',                                            
'/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.iy                
iq01bi/pdfa.pdf', '-sstdout=%stderr',                                           
'/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.iy                
iq01bi/pdfa.ps',                                                                
'/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.iy                
iq01bi/fix_docinfo.pdf']                                                        
GPL Ghostscript 10.04.0 (2024-09-18)                             __init__.py:108
Copyright (C) 2024 Artifex Software, Inc.  All rights reserved.  __init__.py:108
This software is supplied under the GNU AGPLv3 and comes with NO __init__.py:108
WARRANTY:                                                                       
see the file COPYING for details.                                __init__.py:108
Processing pages 1 through 6.                                    __init__.py:108
Page 1                                                           __init__.py:108
Page 2                                                           __init__.py:108
Page 3                                                           __init__.py:108
Page 4                                                           __init__.py:108
Page 5                                                           __init__.py:108
Page 6                                                           __init__.py:108
PDF/A conversion      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
Running: ['tesseract', '--version']                              __init__.py:133
Some input metadata could not be copied because it is not        _metadata.py:63
permitted in PDF/A. You may wish to examine the output PDF's XMP                
metadata.                                                                       
The following metadata fields were not copied:                   _metadata.py:68
{'{http://ns.adobe.com/xap/1.0/}MetadataDate'}                                  
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
xref 574: treating as an optimization candidate                  optimize.py:290
xref 980: treating as an optimization candidate                  optimize.py:290
xref 2124: treating as an optimization candidate                 optimize.py:290
xref 2204: treating as an optimization candidate                 optimize.py:290
xref 2246: treating as an optimization candidate                 optimize.py:290
xref 2248: treating as an optimization candidate                 optimize.py:290
XrefExt(xref=980, ext='.jpg')                                    optimize.py:355
XrefExt(xref=2246, ext='.jpg')                                   optimize.py:355
XrefExt(xref=2248, ext='.jpg')                                   optimize.py:355
XrefExt(xref=2124, ext='.jpg')                                   optimize.py:355
XrefExt(xref=574, ext='.jpg')                                    optimize.py:355
Optimizable images: JPEGs: 5 PNGs: 0                             optimize.py:360
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 5/5 0:00:00
xref 574: treating as an optimization candidate                  optimize.py:290
xref 980: treating as an optimization candidate                  optimize.py:290
xref 2124: treating as an optimization candidate                 optimize.py:290
xref 2204: treating as an optimization candidate                 optimize.py:290
xref 2246: treating as an optimization candidate                 optimize.py:290
xref 2248: treating as an optimization candidate                 optimize.py:290
xref 980: marking this JPEG as deflatable                        optimize.py:555
xref 2246: marking this JPEG as deflatable                       optimize.py:555
xref 2248: marking this JPEG as deflatable                       optimize.py:555
xref 2124: marking this JPEG as deflatable                       optimize.py:555
xref 574: marking this JPEG as deflatable                        optimize.py:555
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 5/5 0:00:00
PNGs                  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
xref 574: treating as an optimization candidate                  optimize.py:290
xref 980: treating as an optimization candidate                  optimize.py:290
xref 2124: treating as an optimization candidate                 optimize.py:290
xref 2204: treating as an optimization candidate                 optimize.py:290
xref 2246: treating as an optimization candidate                 optimize.py:290
xref 2248: treating as an optimization candidate                 optimize.py:290
Running: ['jbig2', '--version']                                  __init__.py:133
xref 980: found image compressed as /FlateDecode /DCTDecode,     optimize.py:103
marked for JPEG optimization                                                    
xref 2246: found image compressed as /FlateDecode /DCTDecode,    optimize.py:103
marked for JPEG optimization                                                    
xref 2248: found image compressed as /FlateDecode /DCTDecode,    optimize.py:103
marked for JPEG optimization                                                    
xref 2124: found image compressed as /FlateDecode /DCTDecode,    optimize.py:103
marked for JPEG optimization                                                    
xref 574: found image compressed as /FlateDecode /DCTDecode,     optimize.py:103
marked for JPEG optimization                                                    
Optimizable images: JBIG2 groups: 1                              optimize.py:371
Running: ['jbig2', '--pdf', '-t', '0.85',                        __init__.py:133
PosixPath('/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrm                
ypdf.io.iyiq01bi/images/00002204.prejbig2.tif')]                                
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
os.symlink(/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmy helpers.py:179
pdf.io.iyiq01bi/optimize.opt.pdf,                                               
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.iyiq               
01bi/optimize.pdf)                                                              
Running: ['jbig2', '--version']                                  __init__.py:133
Running: ['pngquant', '--version']                               __init__.py:133
Image optimization ratio: 1.19 savings: 15.6%                  _pipeline.py:1002
Total file size ratio: 0.99 savings: -0.7%                     _pipeline.py:1005
/var/folders/8z/d8_btvkx7rjbr1q7zm_ynztc0000gn/T/ocrmypdf.io.i _pipeline.py:1077
yiq01bi/optimize.pdf -> ./test_O3.pdf                                           
Output file is a PDF/A-2B (as expected)                           _common.py:474

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do files become bigger after optimization without OCR? #1486

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Why do files become bigger after optimization without OCR? #1486

homocomputeris Feb 22, 2025

Replies: 2 comments

jbarlow83 Feb 22, 2025 Maintainer

homocomputeris Feb 22, 2025 Author

homocomputeris
Feb 22, 2025

jbarlow83
Feb 22, 2025
Maintainer

homocomputeris
Feb 22, 2025
Author