fix: emit valid PDF binary marker and omit empty /Contents#344
Merged
oneplus1000 merged 1 commit intoMay 19, 2026
Merged
Conversation
- gopdf.go: replace mojibake bytes in the PDF binary-comment marker
with the canonical 0xE2 0xE3 0xCF 0xD3 sequence so transfer tools
and viewers reliably detect the file as binary.
- page_obj.go: skip the " /Contents %s\n" line when PageObj.Contents
is empty, preventing a malformed " /Contents \n" entry (seen e.g.
on imported pages that have no content stream).
- Add tests covering both fixes:
- TestPdfBinaryHeader asserts the exact header bytes and that every
marker byte is >= 128 per the PDF spec.
- TestPageObjWriteOmitsContentsWhenEmpty / ...IncludesContentsWhenSet
pin down PageObj.write behavior for both empty and populated
Contents.
Collaborator
|
thank you |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix two PDF compliance defects rejected by Adobe Acrobat / Microsoft Edge
Defect 1: Corrupted PDF binary marker comment
gopdf.go:1051writes"%PDF-1.7\n%����\n\n"where����are fourU+FFFD characters embedded in the source code. When Go writes this string
as UTF-8, each U+FFFD becomes the 3-byte sequence EF BF BD, so the output
file contains 12 bytes (
% EF BF BD EF BF BD EF BF BD EF BF BD \n) insteadof the canonical 4-byte binary marker
% E2 E3 CF D3 \nrecommended byISO 32000-1:2008 §7.5.2.
Strict PDF parsers (Adobe Acrobat error 110, Microsoft Edge) sniff these
bytes as malformed UTF-8 and refuse to open the file. Lenient parsers like
PDFium (Chrome) tolerate it, which has hidden the bug.
This was reported in issue #225 (and others).
Fix: write the four canonical bytes via
\xescape sequence:Defect 2: Empty /Contents key in Page objects
page_obj.go:55writes/Contentsunconditionally. When a page has nocontent stream (no native drawing commands — e.g. when only an imported
template is used),
p.Contentsis the empty string, producing the line:This is invalid PDF syntax (a dictionary key with no value). Per
ISO 32000-1:2008 §7.7.3.3,
/Contentsis OPTIONAL — pages without itare spec-legal and render blank.
Fix: emit the
/Contentsline only when there is a value to emit.