< DIY Book Scanner
tesseract OCR software was developed by HP, placed in the open source domain and more recently has been updated by Google. It's free and high quality so worthy of note. It's a command line driven an example .bat file in the windows environment would be:
tesseract image.tif outputbase
to use a white list where digits is the name of the white list
put this in a text file called tessdata/configs/digits:
tessedit_char_whitelist 0123456789
tesseract image.tif outputbase nobatch digits
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.