Installing TESSERACT-OCR
Preparatory work:
Compilation environment: GCC gcc-c++ make (this environment is common machine, can be ignored)
?
1 |
yum
install gcc gcc
-c++
make
|
Dependent packages: autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel Leptonica (1.67 or more)
1. autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel Zlib-devel can be installed via Yum:
?
12 |
yum
install autoconf automake libtool
yum
install libjpeg-devel libpng-devel libtiff-devel zlib-devel
|
2. Leptonica requires source code compilation and installation
References:
http://paramountideas.com/tesseract-ocr-30-and-leptonica-installation-centos-55-and-opensuse-113
http://www.leptonica.org/source/README.html
Download Leptonica package: http://www.leptonica.org/source/leptonica-1.68.tar.gz
Switch to the leptonica-1.68 root directory after decompression
?
123 |
.
/configure
make
make install
|
Tesseract Installation:
rely on installation to start installing Tesseract
Download tesseract-3.01 installation package: http://tesseract-ocr.googlecode.com/files/tesseract-3.01.tar.gz
Switch to the tesseract-3.01 root directory after decompression
(If you encounter an error similar to strngs.h:1: Error:stray ' \357 ' in the program when you make it, convert the Tesseract-3.01/ccutil/strngs.h file to ANSI encoding to save and re- New compilation)
?
12345 |
.
/autogen
.sh
.
/configure
make
make install
ldconfig
|
Tesseract English Language pack installation:
Download tesseract-3.01 English Language pack: http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.01.eng.tar.gz
after decompression, copy all files under Tesseract-ocr/tessdata to/usr/local/share/tessdata
Installation is complete.
Test it:
Switch to the tesseract-3.01 root directory after decompression (this directory has a self-phototest.tif can be used for testing)
command line:
?
1 |
tesseract phototest.tif phototest -l eng |
Output:
?
12 |
Tesseract Open Source OCR Engine v3.01 with Leptonica Page 0 |
In this case, a phototest.txt text file should be generated in the current directory, and the content is the text displayed phototest.tif.
Reference Document: http://my.oschina.net/iceman/blog/40771
Configuration document:
#安装leptonica
Yum-y Install gcc gcc-c++ make
Yum-y Install autoconf Automake libtool
Yum-y Install libjpeg-devel libpng-devel libtiff-devel zlib-devel
wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
Tar zxvf leptonica-1.72.tar.gz
CD leptonica-1.72
./configure
Make
Make install
#安装tesseract-OCR
wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz
Tar zxvf tesseract-ocr-3.02.02.tar.gz
CD tesseract-ocr/
./autogen.sh
./configure
Make
Make install
Ldconfig
cd/root/
wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz
Tar zxvf tesseract-ocr-3.02.eng.tar.gz
Mv/root/tesseract-ocr/tessdata/usr/local/share/tessdata
#测试
CD tesseract-ocr/
Tesseract phototest.tif Phptotest-l Eng
ll phpto*
Installing Leptonica, TESSERACT-OCR