dps(1) -- a DjVu to PDF converter

SYNOPSIS

dpsprep [options] src [dest]

DESCRIPTION

This tool, initially made specifically for use with Sony's Digital Paper System (DPS), is now a general-purpose DjVu to PDF converter with a focus on small output size and the ability to preserve document outlines (e.g. TOC) and text layers (e.g. OCR).

OPTIONS

-q, --quality: Quality of images in output. Used only for JPEG compression, i.e. RGB and Grayscale images. Passed directly to Pillow and to OCRmyPDF's optimizer.
-p, --pool-size: Size of MultiProcessing pool for handling page-by-page operations.
-v, --verbose: Display debug messages.
-o, --overwrite: Overwrite destination file.
-w, --preserve-working: Preserve the working directory after script termination.
-d, --delete-working: Delete any existing files in the working directory prior to writing to it.
-t, --no-text: Disable the generation of text layers. Implied by --ocr.
-m, --mode: Image mode. The default is to ask libdjvu for the image mode of every page. It sometimes makes sense to force bitonal images since they compress well.
--ocr Perform OCR via OCRmyPDF rather than trying to convert the text layer. If this parameter has a value, it should be a JSON dictionary of options to be passed to OCRmyPDF.
-O1: Use the lossless PDF image optimization from OCRmyPDF (without performing OCR).
-O2: Use the PDF image optimization from OCRmyPDF.
-O3: Use the aggressive lossy PDF image optimization from OCRmyPDF.
--help: Show this message and exit.

EXAMPLES

Produce file.pdf in the current directory:

dpsprep /wherever/file.djvu

Produce output.pdf with reduced image quality and aggressive PDF image optimizations:

dpsprep ---quality=30 -O3 input.djvu output.pdf

Produce an output file using a large pool of workers:

dpsprep --pool=16 input.djvu

Force bitonal images:

dpsprep --mode bitonal input.djvu

Produce an output file by disregarding the text layer and running OCRmyPDF instead:

dpsprep --ocr '{"language": ["rus", "eng"]}' input.djvu

Or simply disregard the text layer without OCR:

dpsprep --no-text input.djvu

NOTE REGARDING COMPRESSION

We perform compression in two stages:

The first one is the default compression provided by Pillow. For bitonal images, the PDF generation code says that, if libtiff is available, group4 compression is used.
If OCRmyPDF is installed, its PDF optimization can be used via the flags -O1 to -O3 (this involves no OCR). This allows us to use advanced techniques, including JBIG2 compression via jbig2enc.

If manually running OCRmyPDF, note that the optimization command suggested in the documentation (setting --tesseract-timeout to 0) may ruin existing text layers. To perform only PDF optimization you can use the following undocumented tool instead:

python -m ocrmypdf.optimize <input_file> <level> <output_file>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpsprep.1.ronn

dpsprep.1.ronn

dps(1) -- a DjVu to PDF converter

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

NOTE REGARDING COMPRESSION

Files

dpsprep.1.ronn

Latest commit

History

dpsprep.1.ronn

File metadata and controls

dps(1) -- a DjVu to PDF converter

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

NOTE REGARDING COMPRESSION