Java App to generate PDF/UA and PDF/A-3A compliant PDFs from HTML using the OpenHTMLtoPDF library.
PDF/UA is a set of requirements for universally-accessible PDF documents. You can refer to the document PDF/UA in a Nutshell published by the PDF association to learn more about PDF/UA.
In the Example folder, you can see an example input HTML and output PDF.
The output PDF has been checked using the following tools and found to be PDF/UA and PDF/A-3A compliant:
- PDF Accessibility Check 3 (Accessibility report included in Example folder)
- Adobe Acrobat Preflight Tool
- VeraPDF Conformance Checker
You can download the compiled Java app from the Releases page.
-
Make sure you have Java installed.
-
In the same folder as the HTML file you want to convert, create a folder called "fonts". This folder must contain all the fonts used by the HTML in TrueType (.ttf) format (even the ones installed locally on your PC needs to be included). It uses the font name reported by the font file; It checks if the filename contains certain keywords to determine the font weight (e.g. thin, extra-light, light, regular, medium, semi-bold, bold, extra-bold, black) and font style (e.g. italic).
-
Make sure to set the page size and margins of the generated PDF by including this style block in your HTML file:
@page { margin: 0; size: a4; }
See this example for more options.
-
Make sure to define these metadata in the
<head>
of your HTML file:<meta name="subject" content="the subject" /> <meta name="author" content="the author" /> <meta name="description" content="the description" />
-
Remember to define the bookmarks in the
<head>
of your HTML file. It functions as a table of contents for your PDF.<bookmarks> <bookmark name="Section 1" href="#section-1" /> <bookmark name="Section 2" href="#section-2" /> <bookmark name="Section 3" href="#section-3" /> </bookmarks>
-
Remember to include the
lang
anddir
attribute in your<html>
tag:<html lang="en" dir="ltr">
-
Ensure that all links contain a
title
attribute to describe the link. -
Run
java -jar html-to-pdf-ua.jar "path/to/your/html/file"
on your console. -
Append
pdf/a-4
to the command to use PDF/A-4 standards instead of PDF/A-3a. -
A file called
output.pdf
will be generated in the same folder as your HTML file.
Make sure you have a Java Development Kit (21 and above) and Apache Maven installed.
I used a slightly modified copy of OpenHTMLtoPDF 1.0.10 (the latest version at the time of writing). The PDFs created using the official OpenHTMLtoPDF 1.0.10 library were violating Clause: 7.18.5 of PDF/UA.
- Download the source code for OpenHTMLtoPDF 1.0.10 (Later versions might work too).
- Extract it and open your console in the created folder.
- Do
git init
. - Copy the patches into the folder and apply them by doing
git am *.patch
. - Run
mvn clean install
to compile, test, package and install it into your local Maven repository. - Clone this repository.
- Run
mvn clean package
to compile, test and package this project.
Don't expect to plug in any random HTML file and receive a nice PDF.
OpenHTMLtoPDF uses its own engine to render the HTML file. So expect email client level feature support (e.g. flexbox and calc are not supported) and plenty of inconsistencies with mainstream browsers.
Chances are, you'll need to rebuild your HTML to accommodate it.
As unfortunate as the situation is, I've yet to find anything free that's better than OpenHTMLtoPDF for creating PDF/UA compliant PDFs from HTML.