You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromdocling_haystack.converterimportDoclingConverter, ExportTypefromdocling.chunkingimportHybridChunker# example setup; instantiate chunker as neededchunker=HybridChunker(tokenizer="sentence-transformers/all-MiniLM-L6-v2")
# set up converterconverter=DoclingConverter(
export_type=ExportType.DOC_CHUNKS,
chunker=chunker,
)
This already provides the capability of getting "splits" created by native Docling chunkers — you may just have to skip the explicit "splitting" pipeline step in this branch of the logic.
Let me know if this answers the question, otherwise we can also have a quick chat.
This seems to do what we need, thank you!
ATM we cannot test this package because we're stuck with legacy versions of docling (<=2.8.3), but I think @ilya-kolchinsky should have more freedom to check this out. I will raise an enhancement issue on their repo for now.
Request to design an Haystack DocumentSplitter based on the Docling chunkers.
docling_splitter is a reference implementation using the
HybridChunker
developed by @ilya-kolchinskyThe text was updated successfully, but these errors were encountered: