Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for mathematical formulas in DOCX conversion #289

Open
cafferychen777 opened this issue Jan 16, 2025 · 0 comments
Open

Add support for mathematical formulas in DOCX conversion #289

cafferychen777 opened this issue Jan 16, 2025 · 0 comments

Comments

@cafferychen777
Copy link

Issue Description
When converting DOCX files containing mathematical formulas to Markdown, the formulas are currently not properly converted. This limits the utility of MarkItDown for academic and technical documents where mathematical expressions are common.

Current Behavior

  • Mathematical formulas in DOCX files are either skipped or converted incorrectly
  • No support for converting Office Math ML (OMML) to LaTeX or other markdown-compatible formats

Expected Behavior

  • Mathematical formulas should be converted to LaTeX format, which is widely supported in Markdown renderers
  • The conversion should preserve the mathematical meaning and formatting of the original formulas
  • Support for both inline and display math modes

Use Cases

  1. Converting academic papers with mathematical content
  2. Processing technical documentation containing equations
  3. Converting educational materials with mathematical expressions

Suggested Implementation

  1. Add support for parsing Office Math ML (OMML) from DOCX files
  2. Implement conversion from OMML to LaTeX
  3. Properly handle both inline and display math modes in the output markdown
  4. Consider using existing libraries like omml2mathml for initial conversion to MathML, then convert to LaTeX

Additional Context
This feature would significantly enhance MarkItDown's utility for academic and technical users who frequently work with mathematical content.

Environment

  • MarkItDown version: [latest]
  • Python version: 3.x
  • OS: All platforms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant