Skip to content

Conversation

lambda-science
Copy link

@lambda-science lambda-science commented Apr 7, 2025

This PR is made to improve the converter with latest Haystack convention. Docling has great potential but this implementation seem's to be dead for months.

  • It fixes the issue Passing custom metadata per document #8 about passing custom metadata to the converter.
  • Rename paths argument to sources to be in line with other converter.
  • Changed tests to be real e2e tests without mock to make sure it works fine. To be honest it was just hard to warp my head around the mock to test if my modification were fine.
  • Support for Haystack ByteStream sources type

…ine with other converter. Changed test to be real e2e tests without mock to make sure it works fine.
Copy link

mergify bot commented Apr 7, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@lambda-science lambda-science changed the title Feat: Custom Metadata and rename arguments feat: Custom Metadata and rename arguments Apr 7, 2025
@lambda-science
Copy link
Author

However with this PR I think we loose the ability to put an URL to file as source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant