-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add "auto" language for TesseractOcr #759
feat: add "auto" language for TesseractOcr #759
Conversation
Signed-off-by: Pavel Denisov <[email protected]>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
We can merge this PR and implement the optimized version at a follow up PR |
Sorry for the delay! I was going to check it in the next few days, but can make a follow up PR too. The problem with CI is that the script OCR models are not installed: https://github.com/DS4SD/docling/actions/runs/12806245234/job/35994648426?pr=759#step:8:155 |
Signed-off-by: Pavel Denisov <[email protected]>
…rs lazily Signed-off-by: Pavel Denisov <[email protected]>
Ubuntu package I'm going to add the check if |
Signed-off-by: Pavel Denisov <[email protected]>
Signed-off-by: Pavel Denisov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Add language-agnostic OCR option for TesseractOcr module. It is invoked when the language option is set to
['auto']
. For more context, see the discussion: #640Please let me know what you think.
Checklist: