-
Notifications
You must be signed in to change notification settings - Fork 0
dougle/scan-OCR-pdf
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a bash script to scan a series of pages from a document into a pdf, at the end of the pdf is the output from OCR on each page. This provides the original scan but also something searchable, e.g. for use in google desktop. I played around with tesseract, very good OCR but no layout analisys!! so cuneiform it is. This was built and tested on a gnome machine but i dont see any reason for it to fail on kde or a non linux OSs as long as dependancies are met, feedback on this would be welcome. To install, currently only manual install: Download and copy the script to a location such as /usr/bin/ $ wget http://github.com/dougle/scan-OCR-pdf/raw/master/scan_ocr_pdf -o /usr/bin/scan_ocr_pdf && \ chmod +x /usr/bin/scan_ocr_pdf You can then add it to your panel or applications menu or run it in a terminal using the command: $ scan_ocr_pdf -d "$HOME/Documents/Scanned" OPTIONS: -h Show this message -d Destination directory -r Resolution to use while scanning, (can cause cuneiform to buffer overflow when set too high) -u Use this device for scanning, instead of the first scanner listed See $ scanimage -L for your devices -g Interface mode, prompts use zenity -s Options to pass to scanimage. Default is: "--mode 'Color' -l 0 -t 0 -x 210 -y 297" (colour A4 area) To debug simple add -x to the first line: #!/bin/bash -x or run $ /bin/bash -x /usr/bin/scan_ocr_pdf -g -d "/home/dougle/Documents" This'll add a line by line transcription of the execution and would be usefull in any weird/occasional bugs. More options and features will be added soon, like a device list if -u is not specified and more than one device is present Forks, Patches, Feature requests, Bug reports, comments welome.
About
A bash script to scan a series of pages, ocr them, then combine the scans and the text into one pdf
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published