Skip to content

Cleanup of Data-Files.md ? #152

@rubyFeedback

Description

@rubyFeedback

Your Feature Request

So https://tesseract-ocr.github.io/tessdoc/Data-Files.html is a bit confusing. The first few links are
to some other github repository? I only wanted to download language data files.

I then ended up using these two URLs:

https://github.com/tesseract-ocr/tessdata/raw/4.00/eng.traineddata

and for german:

https://github.com/tesseract-ocr/tessdata/raw/4.00/deu.traineddata

Not sure if these are old.

It would be nice if someone could go through the whole page systematically.

I think it would be better to, as quickly as possible, show users which trained data
they are RECOMMENDED to download. In the first paragraph, like "most users
may prefer the following language data: NEW_LINE add URL here".

And then explain more on the following paragraphs. Right now I have this wall
of text, I click on something randomly but end up in another github repository.
But I only wanted to get the latest language file instead! Russian roulette
trying to find the proper language data file.

(I understand that a reference is made to linux distributions, but I am a solo hobbyist
dev so I think it would be best to just focus on "users wanting to download the
language data files as quickly as possible" as first priority, before showing much else.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions