Skip to content

Commit ca7d584

Browse files
cau-gitdolfim-ibm
andauthored
feat!: Update API, naming, and tests. Move data model to docling-core (#107)
Signed-off-by: Christoph Auer <[email protected]> Signed-off-by: Michele Dolfi <[email protected]> Co-authored-by: Michele Dolfi <[email protected]>
1 parent e6225c9 commit ca7d584

File tree

48 files changed

+1794189
-1689434
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1794189
-1689434
lines changed

README.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ pip install docling-parse
6565
Convert a PDF (look in the [visualize.py](docling_parse/visualize.py) for a more detailed information)
6666

6767
```python
68-
from docling_parse.document import SegmentedPdfPageLabel
68+
from docling_core.types.doc.page import TextCellUnit
6969
from docling_parse.pdf_parser import DoclingPdfParser, PdfDocument
7070

7171
parser = DoclingPdfParser()
@@ -78,11 +78,11 @@ pdf_doc: PdfDocument = parser.load(
7878
for page_no, pred_page in pdf_doc.iterate_pages():
7979

8080
# iterate over the word-cells
81-
for word in pred_page.yield_cells(label=SegmentedPdfPageLabel.WORD):
82-
print(word.rect, ": ", word.text)
81+
for word in pred_page.iterate_cells(unit_type=TextCellUnit.WORD):
82+
print(word.rect, ": ", word.text)
8383

84-
# create a PIL image with the char cells
85-
img = pred_page.render(label=SegmentedPdfPageLabel.CHAR)
84+
# create a PIL image with the char cells
85+
img = pred_page.render_as_image(cell_unit=TextCellUnit.CHAR)
8686
img.show()
8787
```
8888

0 commit comments

Comments
 (0)