Skip to content

fix: replace fixed-size utf8::append buffers with std::back_inserter to prevent segfaults#224

Merged
PeterStaar-IBM merged 8 commits intomainfrom
fix/problems-on-linux
Feb 20, 2026
Merged

fix: replace fixed-size utf8::append buffers with std::back_inserter to prevent segfaults#224
PeterStaar-IBM merged 8 commits intomainfrom
fix/problems-on-linux

Conversation

@PeterStaar-IBM
Copy link
Member

@PeterStaar-IBM PeterStaar-IBM commented Feb 19, 2026

Pre-allocated std::string buffers passed as output iterators to utf8::append
and utf8::utf16to8 could overflow if the encoded UTF-8 exceeded the allocated
size, causing segfaults. Replace all such patterns with std::back_inserter so
the string grows dynamically as needed.

Affected locations: cmap_value::codepoint_to_utf8, cmap_parser::to_utf8,
cmap_parser::populate_range_mapping_legacy, and page_font encoding fallbacks
(IDENTITY_H/V and CMAP_RESOURCES).

Resolves docling-project/docling#1531

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
…to prevent segfaults

  Pre-allocated std::string buffers passed as output iterators to utf8::append
  and utf8::utf16to8 could overflow if the encoded UTF-8 exceeded the allocated
  size, causing segfaults. Replace all such patterns with std::back_inserter so
  the string grows dynamically as needed.

  Affected locations: cmap_value::codepoint_to_utf8, cmap_parser::to_utf8,
  cmap_parser::populate_range_mapping_legacy, and page_font encoding fallbacks
  (IDENTITY_H/V and CMAP_RESOURCES).

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@github-actions
Copy link
Contributor

DCO Check Passed

Thanks @PeterStaar-IBM, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Feb 19, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@PeterStaar-IBM PeterStaar-IBM changed the title fix: problems on linux fix: replace fixed-size utf8::append buffers with std::back_inserter to prevent segfaults Feb 19, 2026
cau-git
cau-git previously approved these changes Feb 19, 2026
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
dolfim-ibm
dolfim-ibm previously approved these changes Feb 19, 2026
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@dolfim-ibm dolfim-ibm dismissed stale reviews from cau-git and themself via bdcb6b5 February 19, 2026 19:33
dolfim-ibm
dolfim-ibm previously approved these changes Feb 19, 2026
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@PeterStaar-IBM PeterStaar-IBM merged commit 237cef6 into main Feb 20, 2026
34 checks passed
@PeterStaar-IBM PeterStaar-IBM deleted the fix/problems-on-linux branch February 20, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conversion fails with munmap_chunk(): invalid pointer

3 participants