You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm aware that the table has merged cells, and believe this is the source of the problem. This is similar to the documents that i ingest. the output html repeats the data over and over in the columns, which is wonky. Here's the included html as I can't attach html docs.
<!DOCTYPE html>
<html lang="en">
<head>
<link rel="icon" type="image/png"
href="https://ds4sd.github.io/docling/assets/logo.png"/>
<meta charset="UTF-8">
<title>
Powered by Docling
</title>
<style>
html {
background-color: LightGray;
}
body {
margin: 0 auto;
width:800px;
padding: 30px;
background-color: White;
font-family: Arial, sans-serif;
box-shadow: 10px 10px 10px grey;
}
figure{
display: block;
width: 100%;
margin: 0px;
margin-top: 10px;
margin-bottom: 10px;
}
img {
display: block;
margin: auto;
margin-top: 10px;
margin-bottom: 10px;
max-width: 640px;
max-height: 640px;
}
table {
min-width:500px;
background-color: White;
border-collapse: collapse;
cell-padding: 5px;
margin: auto;
margin-top: 10px;
margin-bottom: 10px;
}
th, td {
border: 1px solid black;
padding: 8px;
}
th {
font-weight: bold;
}
table tr:nth-child(even) td{
background-color: LightGray;
}
</style>
</head>
<p>Let’s make some annoying tables</p>
<p></p>
<p></p>
<table><tbody><tr><td>Summary</td><td>Some summary description</td><td>Some summary description</td></tr><tr><td>This is some text that will be repeated</td><td>This is some text that will be repeated</td><td>This is some text that will be repeated</td></tr><tr><td>Purpose</td><td>Some purpose description</td><td>Some purpose description</td></tr><tr><td>Second bundle of text to be repeated</td><td>Second bundle of text to be repeated</td><td>Second bundle of text to be repeated</td></tr><tr><td>Context</td><td>some context stuff</td><td>some context stuff</td></tr><tr><td>This is the 3rd section</td><td>This is the 3rd section</td><td>This is the 3rd section</td></tr><tr><td>Audience</td><td>Please provide the specific audience for your selected text.</td><td>Please provide the specific audience for your selected text.</td></tr><tr><td>So much audience</td><td>So much audience</td><td>So much audience</td></tr><tr><td>Appeals</td><td>stuff</td><td>stuff</td></tr><tr><td>stuff<br>even more stuff</td><td>stuff<br>even more stuff</td><td>stuff<br>even more stuff</td></tr><tr><td>Sources</td><td>Sources</td><td>So much stuff</td></tr><tr><td>Blarghhh stuff</td><td>Blarghhh stuff</td><td>Blarghhh stuff</td></tr></tbody></table>
<p></p>
</html>
Bug
Tables are not converted properly, there are repeating columns when converting docx to html.
...
Steps to reproduce
...
Docling version
Docling version: 2.15.1
Docling Core version: 2.14.0
Docling IBM Models version: 3.1.2
Docling Parse version: 3.1.0
...
Python version
3.10.16
...
NOTES:
I'm aware that the table has merged cells, and believe this is the source of the problem. This is similar to the documents that i ingest. the output html repeats the data over and over in the columns, which is wonky. Here's the included html as I can't attach html docs.
josh_test_document.docx
The text was updated successfully, but these errors were encountered: