Skip to content

Commit

Permalink
chore: update the with input formats and DoclingDocument (#188)
Browse files Browse the repository at this point in the history

---------

Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>
Co-authored-by: Christoph Auer <[email protected]>
  • Loading branch information
3 people authored Oct 30, 2024
1 parent f542460 commit 94a5290
Show file tree
Hide file tree
Showing 7 changed files with 38 additions and 14 deletions.
14 changes: 14 additions & 0 deletions .github/workflows/cd-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: "Run Docs CD"

on:
push:
branches:
- "main"

jobs:
build-deploy-docs:
uses: ./.github/workflows/docs.yml
with:
deploy: true
permissions:
contents: write
6 changes: 0 additions & 6 deletions .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,6 @@ env:
jobs:
code-checks:
uses: ./.github/workflows/checks.yml
build-deploy-docs:
uses: ./.github/workflows/docs.yml
with:
deploy: true
permissions:
contents: write
pre-release-check:
runs-on: ubuntu-latest
outputs:
Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/ci-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: "Run Docs CI"

on:
pull_request:
types: [opened, reopened, synchronize]
push:
branches:
- "**"
- "!gh-pages"

jobs:
build-docs:
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
uses: ./.github/workflows/docs.yml
with:
deploy: false
6 changes: 1 addition & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
push:
branches:
- "**"
- "!main"
- "!gh-pages"

env:
Expand All @@ -16,8 +17,3 @@ jobs:
code-checks:
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
uses: ./.github/workflows/checks.yml
build-docs:
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
uses: ./.github/workflows/docs.yml
with:
deploy: false
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@ Docling parses documents and exports them to the desired format with ease and sp

## Features

* πŸ—‚οΈ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
* πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures
* πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
* πŸ“‘ Advanced PDF document understanding including page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
* πŸ“ Metadata extraction, including title, authors, references & language
* πŸ€– Seamless LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— integration for powerful RAG / QA applications
* πŸ” OCR support for scanned PDFs
Expand Down
2 changes: 2 additions & 0 deletions docs/concepts/docling_document.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ pydantic datatype, which can express several features common to documents, such
* Layout information (i.e. bounding boxes) for all items, if available
* Provenance information

The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc).

It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.

## Example document structures
Expand Down
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ Docling parses documents and exports them to the desired format with ease and sp

## Features

* πŸ—‚οΈ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
* πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
* πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
* πŸ“ Metadata extraction, including title, authors, references & language
* πŸ€– Seamless LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— integration for powerful RAG / QA applications
* πŸ” OCR support for scanned PDFs
Expand Down

0 comments on commit 94a5290

Please sign in to comment.