Skip to content

fix: merge mixed-font symbols into line cells#230

Open
serge-medvedev wants to merge 2 commits intodocling-project:mainfrom
serge-medvedev:fix/line-cell-merge-mixed-font-symbols
Open

fix: merge mixed-font symbols into line cells#230
serge-medvedev wants to merge 2 commits intodocling-project:mainfrom
serge-medvedev:fix/line-cell-merge-mixed-font-symbols

Conversation

@serge-medvedev
Copy link

Use the order-independent line merge path when building textline cells and disable same-font enforcement for line-cell merges. This prevents fallback-font symbols (for example arrows) from being emitted as standalone line cells when PDF content-stream order differs from visual order.

Also make v2 line merging handle reverse adjacency symmetrically and add a targeted synthetic regression fixture/test that asserts the exact textline-cell output.

Use the order-independent line merge path when building textline cells and disable same-font enforcement for line-cell merges. This prevents fallback-font symbols (for example arrows) from being emitted as standalone line cells when PDF content-stream order differs from visual order.

Also make v2 line merging handle reverse adjacency symmetrically and add a targeted synthetic regression fixture/test that asserts the exact textline-cell output.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

DCO Check Passed

Thanks @serge-medvedev, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Feb 23, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

I, Serge Medvedev <thismoment.main@gmail.com>, hereby add my Signed-off-by to this commit: 81dbadb

Signed-off-by: Serge Medvedev <thismoment.main@gmail.com>
// Use the order-independent merge path for line construction and do not require font equality.
contract_cells_into_lines_v2(line_cells,
config.horizontal_cell_tolerance,
false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you hard code it if we have a config parameter for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to lack of understanding of the code base.

The bug surfaced in docling-serve and it took a bit of triage to dig this deep.

What about adding a new config option, e.g. decode_page_config.enforce_same_font_for_line_cells (as the existing one decode_page_config.enforce_same_font is too coarse)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants