-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable markdown text formatting for docx #630
base: main
Are you sure you want to change the base?
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Signed-off-by: SimJeg <[email protected]>
a221428
to
7f9464b
Compare
Note: for underline I used the |
@maxmnemonic @PeterStaar-IBM do you need any additional info for this PR ? |
@SimJeg this is an interesting feature, but we should introduce it with an option for enable/disable, because not all output formats will be compatible with markdown styling. There could also be some consideration on whether to propagate text styling in the Docling document format, but the option will be needed. |
Hi @dolfim-ibm, Indeed, a different function should be applied for HTML for instance. I can add an argument to the convert function (e.g. style=[None, "markdown", "htlm"]). As there are several options to do this and I don't know very well docling API, I'll wait for your confirmation before pushing updates. |
@dolfim-ibm any update on it ? |
We actually are considering something similar to what you are proposing. Adding the option for the format at convert time (with default None) is good, but we would like to have them in the PipelineOptions for the MS Word backend, since it will be something specific to it. We will soon post more details, but the above is the general idea. |
Hi,
This PR adds markdown text formatting for docx documents (italic, bold, underline and hyperlinks). I included a new
tests/data/docx/unit_test_formatting.docx
document to illustrate it. Using the latest docling main the output ofexport_to_markdown
is:with this PR it becomes: