Skip to content

fix: emit valid PDF binary marker and omit empty /Contents#344

Merged
oneplus1000 merged 1 commit into
signintech:masterfrom
qiwzee:fix/pdf-binary-marker-and-empty-contents
May 19, 2026
Merged

fix: emit valid PDF binary marker and omit empty /Contents#344
oneplus1000 merged 1 commit into
signintech:masterfrom
qiwzee:fix/pdf-binary-marker-and-empty-contents

Conversation

@qiwzee

@qiwzee qiwzee commented May 18, 2026

Copy link
Copy Markdown

Fix two PDF compliance defects rejected by Adobe Acrobat / Microsoft Edge

Defect 1: Corrupted PDF binary marker comment

gopdf.go:1051 writes "%PDF-1.7\n%����\n\n" where ���� are four
U+FFFD characters embedded in the source code. When Go writes this string
as UTF-8, each U+FFFD becomes the 3-byte sequence EF BF BD, so the output
file contains 12 bytes (% EF BF BD EF BF BD EF BF BD EF BF BD \n) instead
of the canonical 4-byte binary marker % E2 E3 CF D3 \n recommended by
ISO 32000-1:2008 §7.5.2.

Strict PDF parsers (Adobe Acrobat error 110, Microsoft Edge) sniff these
bytes as malformed UTF-8 and refuse to open the file. Lenient parsers like
PDFium (Chrome) tolerate it, which has hidden the bug.

This was reported in issue #225 (and others).

Fix: write the four canonical bytes via \x escape sequence:

fmt.Fprint(writer, "%PDF-1.7\n%\xe2\xe3\xcf\xd3\n\n")

Defect 2: Empty /Contents key in Page objects

page_obj.go:55 writes /Contents unconditionally. When a page has no
content stream (no native drawing commands — e.g. when only an imported
template is used), p.Contents is the empty string, producing the line:

/Contents 

This is invalid PDF syntax (a dictionary key with no value). Per
ISO 32000-1:2008 §7.7.3.3, /Contents is OPTIONAL — pages without it
are spec-legal and render blank.

Fix: emit the /Contents line only when there is a value to emit.

- gopdf.go: replace mojibake bytes in the PDF binary-comment marker
  with the canonical 0xE2 0xE3 0xCF 0xD3 sequence so transfer tools
  and viewers reliably detect the file as binary.
- page_obj.go: skip the "  /Contents %s\n" line when PageObj.Contents
  is empty, preventing a malformed "  /Contents \n" entry (seen e.g.
  on imported pages that have no content stream).
- Add tests covering both fixes:
  - TestPdfBinaryHeader asserts the exact header bytes and that every
    marker byte is >= 128 per the PDF spec.
  - TestPageObjWriteOmitsContentsWhenEmpty / ...IncludesContentsWhenSet
    pin down PageObj.write behavior for both empty and populated
    Contents.
@oneplus1000 oneplus1000 merged commit 325e193 into signintech:master May 19, 2026
2 checks passed
@oneplus1000

Copy link
Copy Markdown
Collaborator

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants