Recently to do word recognition, do not let the interface directly with others, so you can only try to use Open source class library. TESSERACT-OCR is an open-source word Recognition project from Hewlett-Packard, which allows us to quickly build a text-to-text recognition system to help develop an OCR system that recognizes images. Because of the Windows environment Development, I also have to install the system in the Windows environment.
First step: Download the installation package
According to Https://github.com/tesseract-ocr/tesseract/wiki, I found an unofficial installation package, as if I only saw a 64-bit installation package http://digi.bib.uni-mannheim.de/ Tesseract/tesseract-ocr-setup-4.00.00dev.exe, install directly after downloading, but remember your installation directory, we will configure environment variables to use.
If you do not do the English text recognition, you also need to download the other language identification package https://github.com/tesseract-ocr/tesseract/wiki/Data-Files.
Simplified Chinese identification kit: Https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/chi_sim.traineddata
Traditional Chinese identification kit: Https://github.com/tesseract-ocr/tessdata/raw/4.0/chi_tra.traineddata
Step Two: Install
Directly perform the downloaded Tesseract-ocr-setup-4.00.00dev.exe, next, next installation.
Step Three: Configure environment variables
Note: My system is win7, other systems should be the same as the configuration of Java variables
Copy your installation address, my is installed in C:\Program Files (x86) \TESSERACT-OCR, the interface is as follows:
Copy the installation path "C:\Program Files (x86) \TESSERACT-OCR", Go to "Control Panel \ System and Security \ System", click
"System Protection"
Go to the following interface:
Click on the environment variable and go to configure the following interface:
Add the previous installation path "C:\Program Files (x86) \TESSERACT-OCR" to the Red Line row path and path, note that the add time begins with ";" Separated from the previous variable, ending with ";" End. Here is a sample of my configuration information:
C:\Users\Administrator\AppData\Roaming\Composer\vendor\bin; C:\USERS\ADMINISTRATOR\APPDATA\ROAMING\NPM; C:\Program Files (x86) \TESSERACT-OCR;
Configure the Click to save.
Open command Terminal, enter: Tesseract-v, you can see the version information
If an error occurs, it is estimated that the environment variable is not configured properly.
Here, we even if the installation is complete, but our system is still unable to recognize the Chinese, we have to download the Simplified Chinese character, the Traditional Chinese language pack (given the address above), downloaded and placed in the installation directory of the Tessconfigs directory can be.
Add: Because there is no global variable configured to perform data conversions across disks, here we add a configuration message to the environment variable
System Variables-NEW:
Add a Tessdata_prefix variable name, the variable value or my installation path C:\Program Files (x86) \TESSERACT-OCR;
I blog: Windows environment installation TESSERACT-OCR 4.00 and configure environment variables
http://www.wangtuizhijia.com/archives/272
Windows environment installs TESSERACT-OCR 4.00 and configures environment variables