DIY Book Scanner/OCR

tesseract OCR software was developed by HP, placed in the open source domain and more recently has been updated by Google. It's free and high quality so worthy of note. It's a command line driven an example .bat file in the windows environment would be:

tesseract image.tif outputbase

to use a white list where digits is the name of the white list

 put this in a text file called tessdata/configs/digits:

 tessedit_char_whitelist 0123456789

tesseract image.tif outputbase nobatch digits

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.