Pure-Rust DjVu codec — decode and encode DjVu documents. MIT licensed, no GPL dependencies. Written from the DjVu v3 public specification.
- IFF container parser — zero-copy, borrowing slices from input
- JB2 bilevel image decoder — adaptive arithmetic coding (ZP coder) with symbol dictionary
- JB2 bilevel image encoder — encode any
Bitmapinto a validSjbzchunk payload - IW44 wavelet image decoder — planar YCbCr storage, multiple refinement chunks
- IW44 wavelet image encoder — encode color (
Pixmap) or grayscale (GrayPixmap) intoBG44/FG44chunk payloads - G4/MMR bilevel image decoder — ITU-T T.6 Group 4 fax decoder (
Smmrchunks) - BZZ decompressor — ZP arithmetic coding + MTF + BWT (DIRM, NAVM, ANTz chunks)
- Text layer extraction — TXTz/TXTa chunk parsing with zone hierarchy (page/column/region/paragraph/line/word/character)
- Annotation parsing — ANTz/ANTa chunk parsing (hyperlinks, map areas, background color)
- Annotation encoding — serialize
Annotation+MapAreaslices into ANTa or ANTz chunk payloads - Bookmarks — NAVM table-of-contents parsing
- Bookmark encoding — serialize
DjVuBookmarktrees into NAVM chunk payloads - Multi-page documents — DJVM bundle format with DIRM directory chunk; indirect DJVM creation and loading from directory
- Page rendering — composite foreground + background into RGBA output
- PDF export — selectable text, lossless IW44/JB2 embedding, bookmarks, hyperlinks
- TIFF export — multi-page color and bilevel modes (feature flag
tiff) - hOCR / ALTO XML export — text layer as hOCR or ALTO XML for OCR toolchains and archives
- Serde support —
Serialize/Deserializeon all public data types (feature flagserde) - EPUB 3 export — page images + invisible text overlay + bookmarks as navigation (feature flag
epub) - WebAssembly (WASM) —
wasm-bindgenbindings for use in browsers and Node.js (feature flagwasm) - image-rs integration —
image::ImageDecoderimpl for use with theimagecrate (feature flagimage) - Async render and lazy loading — async render wrappers plus true per-page lazy loading over
AsyncRead + AsyncSeek(feature flagasync) - Workspace codec crates — standalone
djvu-iff,djvu-bzz,djvu-bitmap,djvu-jb2,djvu-pixmap,djvu-iw44, anddjvu-zpcrates for focused consumers - Fuzzing integration — libFuzzer targets and in-tree OSS-Fuzz project files
no_stdcompatible — IFF/BZZ/JB2/IW44/ZP codec modules work withalloconly
use djvu_rs::{DjVuDocument, djvu_render::{render_pixmap, RenderOptions}};
let data = std::fs::read("file.djvu")?;
let doc = DjVuDocument::parse(&data)?;
println!("{} pages", doc.page_count());
let page = doc.page(0)?;
println!("{}×{} @ {} dpi", page.width(), page.height(), page.dpi());
let target_dpi = 150u32;
let opts = RenderOptions {
width: ((page.width() as u32 * target_dpi) / page.dpi() as u32).max(1),
height: ((page.height() as u32 * target_dpi) / page.dpi() as u32).max(1),
..Default::default()
};
let pixmap = render_pixmap(page, &opts)?;
// pixmap.data — RGBA bytes (width × height × 4), row-majoruse djvu_rs::DjVuDocument;
let data = std::fs::read("scanned.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let page = doc.page(0)?;
if let Some(text) = page.text()? {
println!("{text}");
}use djvu_rs::{DjVuDocument, pdf::djvu_to_pdf};
let data = std::fs::read("book.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let pdf_bytes = djvu_to_pdf(&doc)?;
std::fs::write("book.pdf", pdf_bytes)?;Requires the tiff feature flag: djvu-rs = { version = "…", features = ["tiff"] }.
use djvu_rs::{DjVuDocument, tiff_export::{djvu_to_tiff, TiffOptions}};
let data = std::fs::read("scan.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let tiff_bytes = djvu_to_tiff(&doc, &TiffOptions::default())?;
std::fs::write("scan.tiff", tiff_bytes)?;Requires the async feature flag: djvu-rs = { version = "…", features = ["async"] }.
The render entry points are synchronous and CPU-bound; run them on the
blocking thread pool with tokio::task::spawn_blocking so they stay off the
async runtime. The render error type stays the typed RenderError — there is
no wrapper enum.
use djvu_rs::{DjVuDocument, djvu_render::{self, RenderOptions}};
let data = std::fs::read("file.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let page = doc.page(0)?.clone();
let target_dpi = 150u32;
let opts = RenderOptions {
width: ((page.width() as u32 * target_dpi) / page.dpi() as u32).max(1),
height: ((page.height() as u32 * target_dpi) / page.dpi() as u32).max(1),
..Default::default()
};
let pixmap = tokio::task::spawn_blocking(move || {
djvu_render::render_pixmap(&page, &opts)
})
.await??; // outer `?`: join error (panic); inner `?`: RenderErrorFor progressive (per-BG44-chunk) rendering, djvu_async::render_progressive_stream
yields a Stream of frames, each produced on the blocking pool.
Requires the async feature flag. The lazy loader keeps a seekable async
reader and fetches page/component byte ranges only when page_async(i) is
called. Parsed pages are cached as Arc<DjVuPage>.
use djvu_rs::djvu_async::from_async_reader_lazy;
let file = tokio::fs::File::open("book.djvu").await?;
let doc = from_async_reader_lazy(file).await?;
println!("{} pages", doc.page_count());
let page = doc.page_async(0).await?;
println!("first page: {}×{}", page.width(), page.height());Supported shapes: single-page FORM:DJVU and bundled FORM:DJVM, including
shared DJVI dictionaries referenced via INCL. For browser-local !Send
readers on wasm32, use from_async_reader_lazy_local.
See examples/async_lazy_first_page.rs
for a native first-page latency probe and
examples/wasm/range_lazy.md for the HTTP
Range: bytes=start-end integration shape.
use djvu_rs::iff::parse_form;
let data = std::fs::read("file.djvu")?;
let form = parse_form(&data)?;
println!("FORM type: {:?}", std::str::from_utf8(&form.form_type));
for chunk in &form.chunks {
println!(" chunk {:?} ({} bytes)", std::str::from_utf8(&chunk.id), chunk.data.len());
}use djvu_rs::{bitmap::Bitmap, jb2_encode::encode_jb2};
let mut bm = Bitmap::new(800, 1000);
// ... fill bitmap pixels ...
let sjbz_payload = encode_jb2(&bm);
// Wrap in a Sjbz IFF chunk and embed in a DjVu FORM:DJVU.use djvu_rs::{Pixmap, iw44_encode::{encode_iw44_color, Iw44EncodeOptions}};
let pixmap: Pixmap = /* ... your RGBA/YCbCr image ... */;
let chunks: Vec<Vec<u8>> = encode_iw44_color(&pixmap, &Iw44EncodeOptions::default());
// Each Vec<u8> is a BG44 chunk payload; wrap each in a BG44 IFF tag.Grayscale:
use djvu_rs::{GrayPixmap, iw44_encode::{encode_iw44_gray, Iw44EncodeOptions}};
let gray: GrayPixmap = /* ... */;
let chunks: Vec<Vec<u8>> = encode_iw44_gray(&gray, &Iw44EncodeOptions::default());Iw44EncodeOptions fields (all have sensible defaults):
| Field | Default | Description |
|---|---|---|
slices_per_chunk |
10 | Slices packed into each BG44/FG44 chunk |
total_slices |
100 | Total refinement slices to encode |
chroma_delay |
0 | Y slices before Cb/Cr encoding begins |
chroma_half |
true | Encode chroma at half resolution |
use djvu_rs::{djvu_document::DjVuBookmark, navm_encode::encode_navm};
let bookmarks = vec![
DjVuBookmark { title: "Chapter 1".into(), url: "#page=1".into(), children: vec![] },
];
let navm_payload = encode_navm(&bookmarks);use djvu_rs::annotation::{Annotation, MapArea, encode_annotations, encode_annotations_bzz};
let ann = Annotation::default();
let areas: Vec<MapArea> = vec![/* ... */];
let anta_payload = encode_annotations(&ann, &areas); // uncompressed ANTa
let antz_payload = encode_annotations_bzz(&ann, &areas); // BZZ-compressed ANTzCreate an indirect DJVM index file that references per-page .djvu files:
use djvu_rs::djvm::create_indirect;
let index = create_indirect(&["page001.djvu", "page002.djvu", "page003.djvu"])?;
std::fs::write("book.djvu", index)?;
// Distribute book.djvu alongside the individual page files.Load an indirect document by resolving component files from a directory:
use djvu_rs::DjVuDocument;
let index = std::fs::read("book.djvu")?;
let doc = DjVuDocument::parse_from_dir(&index, "/path/to/pages")?;
println!("{} pages", doc.page_count());Mutation of indirect DJVM documents is not supported by DjVuDocumentMut yet.
The current strategy decision is to add a resolver-backed rebundling path first;
see docs/indirect-djvm-mutation.md.
The djvu binary is enabled by the cli feature.
# Install
cargo install djvu-rs --features cli
# Document info
djvu info file.djvu
# Render page 1 to PNG at 200 DPI
djvu render file.djvu --dpi 200 --output page1.png
# Render all pages to a PDF
djvu render file.djvu --all --format pdf --output out.pdf
# Export all pages to CBZ
djvu render file.djvu --all --format cbz --output out.cbz
# Extract text from page 2
djvu text file.djvu --page 2
# Extract text from all pages
djvu text file.djvu --all
# Encode a PNG image into a single-page DjVu (bilevel JB2, lossless)
djvu encode scan.png --output scan.djvu --dpi 300
# Encode a PNG image into a layered lossy DjVu (JB2 mask + IW44 background + FGbz foreground color)
djvu encode scan.png --quality quality --output scan.djvu --dpi 300
# Use the conservative archival color profile for a single PNG
djvu encode scan.png --quality archival --output scan.djvu --dpi 300
# Opt into adaptive mask segmentation for uneven scans
djvu encode scan.png --quality quality --binarization sauvola --bg-inpaint --output scan.djvu
# Encode a directory of PNGs into a bundled DJVM with shared Djbz
djvu encode pages/ --output book.djvu --shared-dict-pages 2For single PNG input, --quality lossless luminance-thresholds the image into a
JB2 mask and writes INFO + Sjbz; --quality quality uses the layered encoder
(INFO + Sjbz + BG44... plus FGbz when colored foreground is detected) for
color input. --quality archival uses the same layered shape with a denser
background sample grid. Directory input supports all three profiles: lossless
keeps the shared-Djbz multi-page JB2 path, while quality / archival bundle
independently encoded layered pages so each page keeps its own Sjbz, BG44,
and optional FGbz chunks. The --shared-dict-pages knob only affects the
lossless directory path.
Layered quality / archival encodes default to fixed BT.601 thresholding.
--binarization sauvola opts into adaptive local thresholding for mixed or
uneven lighting; tune it with --sauvola-window and --sauvola-k.
--bg-inpaint fills fully masked background blocks from neighbouring unmasked
pixels, which can reduce dark boxes under heavy text strokes. These knobs are
opt-in, only affect layered profiles, and do not change lossless JB2 defaults.
Library callers can use the same controls with PageEncoder::with_segment_options.
use djvu_rs::{DjVuDocument, text_serialize::{to_hocr, to_alto, HocrOptions, AltoOptions}};
let data = std::fs::read("scanned.djvu")?;
let doc = DjVuDocument::parse(&data)?;
// hOCR — compatible with Tesseract, ABBYY, and most OCR toolchains
let hocr = to_hocr(&doc, &HocrOptions::default())?;
std::fs::write("output.hocr", hocr)?;
// ALTO XML — used by libraries and archives (DFG, Europeana, etc.)
let alto = to_alto(&doc, &AltoOptions::default())?;
std::fs::write("output.xml", alto)?;The supported OCR recognition path is the ocr-tesseract feature, which uses a
system Tesseract installation and tessdata files:
cargo build --features cli,ocr-tesseract
# Requires Tesseract + the requested language data, e.g. eng.traineddata.
# Text-layer injection is still pending; the CLI reports recognized text chunks
# and writes a copy of the input file for now.
djvu ocr scanned.djvu --backend tesseract --lang eng --output copy.djvuocr-onnx is an experimental library-level CTC helper; the CLI does not treat it
as a stable backend because no specific model family, preprocessing contract, or
fixture is guaranteed yet. ocr-neural is a placeholder only: CandleBackend now
returns a clear unsupported-backend error instead of constructing a backend that
always fails at recognition time. The compatibility feature name
ocr-neural-candle is a no-op and no longer pulls Candle/tokenizers into
--all-features builds.
Requires the serde feature flag: djvu-rs = { version = "…", features = ["serde"] }.
All public data types (DjVuBookmark, TextZone, MapArea, PageInfo, etc.) implement
Serialize and Deserialize.
use djvu_rs::DjVuDocument;
let data = std::fs::read("book.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let json = serde_json::to_string_pretty(doc.bookmarks())?;
println!("{json}");Requires the image feature flag: djvu-rs = { version = "…", features = ["image"] }.
use djvu_rs::{DjVuDocument, image_compat::DjVuDecoder};
use image::DynamicImage;
let data = std::fs::read("file.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let page = doc.page(0)?;
let decoder = DjVuDecoder::new(page)?.with_size(1200, 1600);
let img = DynamicImage::from_decoder(decoder)?;
img.save("page.png")?;Requires the epub feature flag: djvu-rs = { version = "…", features = ["epub"] }.
use djvu_rs::{DjVuDocument, epub::{djvu_to_epub, EpubOptions}};
let data = std::fs::read("book.djvu")?;
let doc = DjVuDocument::parse(&data)?;
let epub_bytes = djvu_to_epub(&doc, &EpubOptions::default())?;
std::fs::write("book.epub", epub_bytes)?;CLI:
djvu render book.djvu --format epub --output book.epubBuild with wasm-pack:
wasm-pack build --target bundler --features wasmThen use in JavaScript/TypeScript:
import init, { WasmDocument } from './pkg/djvu_rs.js';
await init();
const doc = WasmDocument.from_bytes(new Uint8Array(arrayBuffer));
console.log(doc.page_count());
const page = doc.page(0);
const pixels = page.render(150); // Uint8ClampedArray, RGBA
const img = new ImageData(pixels, page.width_at(150), page.height_at(150));
ctx.putImageData(img, 0, 0);See examples/wasm/ for a complete drag-and-drop demo.
The generated npm package follows the Rust crate version; there is no separate
WASM release train. The local pkg/ directory is ignored wasm-pack output, so
regenerate it from the checked-in Cargo.toml before publishing instead of
editing generated pkg/package.json by hand.
The local Node.js harness builds two wasm-pack --target nodejs bundles and
compares scalar wasm32 against RUSTFLAGS="-C target-feature=+simd128":
ITERATIONS=50 WARMUP=10 DPI=150 ./scripts/bench_wasm_simd128.shThe script uses tests/fixtures/boy.djvu by default and reports parse,
full-render, cached-render, and first progressive-render timings. CI
syntax-checks the harness and build-checks both wasm targets, but does not run
timing comparisons because hosted runner variance is too high for stable
regression gates.
| Flag | Default | Description |
|---|---|---|
std |
enabled | DjVuDocument, file I/O, rendering, PDF export |
cli |
disabled | Build the djvu command-line binary |
tiff |
disabled | TIFF export via the tiff crate |
async |
disabled | Async render API and lazy AsyncRead + AsyncSeek document loading |
parallel |
disabled | Parallel multi-page render via rayon (render_pages_parallel) |
jpeg |
disabled | Standalone JPEG decode without full std (JPEG is included in std by default) |
mmap |
disabled | Memory-mapped file I/O via memmap2 (DjVuDocument::from_mmap) |
serde |
disabled | Serialize + Deserialize for all public data types |
image |
disabled | image::ImageDecoder impl via DjVuDecoder — integrates with the image crate |
epub |
disabled | EPUB 3 export via djvu_to_epub — page images, text overlay, bookmarks as nav |
wasm |
disabled | WebAssembly bindings via wasm-bindgen (WasmDocument, WasmPage) |
Without std, the crate provides IFF parsing, BZZ decompression, JB2/IW44 decoding,
text/annotation parsing — all codec primitives that work on byte slices.
See BENCHMARKS_RESULTS.md for Criterion numbers,
methodology, and a DjVuLibre comparison (run via
scripts/bench_djvulibre.sh +
scripts/djvulibre_compare.py).
Historical multi-platform results are in BENCHMARKS.md.
Recent targeted experiments are recorded in PERF_EXPERIMENTS.md, including:
- #233 lazy async loading: a 100 MiB padded 520-page DJVM reached first pixel in 491.469 ms while reading only 28,578 bytes at simulated 12.5 MiB/s throughput.
- #189 x86-64-v3 AVX2 validation: existing AVX2 decode paths showed
iw44_decode_corpus_color-18.88% andiw44_decode_first_chunk-4.85% on GitHub-hosted x86_64, with one sub4 partial-decode regression recorded for follow-up. - #258 shared-Djbz clustering: Hamming shared clustering was rejected as default; byte-exact shared-Djbz remains the measured safe path.
Rust 1.88 (edition 2024 — let-chains stabilized in 1.88)
See GitHub milestones for the full roadmap and progress tracking.
MIT. See LICENSE.
Written from the public DjVu v3 specification:
- https://www.sndjvu.org/spec.html
- https://djvu.sourceforge.net/spec/DjVu3Spec.djvu (the spec is itself a DjVu file)
No code derived from GPL-licensed DjVuLibre or any other GPL source. All algorithms are independent implementations from the spec.