Skip to content

fix: handle math run with no text child in OMML->LaTeX conversion#2189

Open
S1MS4 wants to merge 1 commit into
microsoft:mainfrom
S1MS4:fix/docx-math-run-missing-text
Open

fix: handle math run with no text child in OMML->LaTeX conversion#2189
S1MS4 wants to merge 1 commit into
microsoft:mainfrom
S1MS4:fix/docx-math-run-missing-text

Conversation

@S1MS4

@S1MS4 S1MS4 commented Jul 3, 2026

Copy link
Copy Markdown

Summary

Fixes #2188.

  • do_r() in omml.py called elm.findtext("./m:t") and iterated over the result directly. When a math run (<m:r>) has no <m:t> text child (e.g. a run that only carries formatting properties), findtext() returns None, and iterating over None raises TypeError: 'NoneType' object is not iterable.
  • Because pre_process_docx() pre-processes math for the whole word/document.xml in one pass wrapped in a blanket try/except, this single malformed run silently drops every equation in the document on fallback, with no error surfaced to the caller.
  • Fix: elm.findtext(...) or "" so a missing text child is treated as an empty run instead of crashing.

Verified against a real-world .docx with 62 native equations: before the fix, all 62 were silently dropped from the markdown output; after the fix, all render correctly as LaTeX (e.g. $l\_{1}$, $λ\_{1}\geq λ\_{2}\geq …\geq λ\_{k}$).

This is a different root cause from #1979 (math_root.find("oMath") returning None in _convert_omath_to_latex) and #1982 (unsupported function names) — different function (do_r), different failure mode (missing <m:t> on an otherwise valid run).

Test plan

  • Added packages/markitdown/tests/test_docx_omml.py with regression tests:
    • a math run with no <m:t> child no longer crashes and yields an empty string
    • a normal run with text still converts correctly
    • a subscript expression where one run has no text still converts correctly (mirrors the real-world case)
  • Confirmed tests fail with the pre-fix code (TypeError) and pass with the fix
  • Ran pytest packages/markitdown/tests/test_docx_omml.py -v — all passing

@S1MS4

S1MS4 commented Jul 3, 2026

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

do_r() called elm.findtext("./m:t") and iterated over the result
directly. When a math run (<m:r>) has no <m:t> text child (e.g. a
run that only carries formatting properties, produced by some Word
equation editors), findtext() returns None and iterating over it
raises TypeError: 'NoneType' object is not iterable.

Because equation pre-processing is applied at the whole-document.xml
level with a blanket try/except, this single malformed run aborts
LaTeX conversion for every equation in the document, silently
dropping all native Word equations from the output.
@S1MS4 S1MS4 force-pushed the fix/docx-math-run-missing-text branch from 77f40af to c8f2308 Compare July 3, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: DOCX math converter crashes (and silently drops all equations) when a math run has no text child

1 participant