Tesseract installation, tesseract
[1] direct Installation
1) In Ubuntu 14.04, you can directly install the release package tesseract-ocr
Sudo apt-get install tesseract-ocr
In this way, the data files of the system are in/usr/share/tesseract-ocr/tessdata under/usr/bin (The eng package has been installed)
There is a folder named pytesseract under/usr/local/lib/python *. */dist-package.
(Maybe I installed it accidentally. The sudo pip install pytesseract installation is written on GitHub [https://github.com/madmaze/pytesseract.pdf ),
In this way, you can use tesseract in Python. The example is as follows:
Import Image
Import pytesseract
Print pytesseract. image_to_string (Image. open ('./Test/Python/t2.png '))
Print pytesseract. image_to_string (Image. open ('./Test/Python/t2.png'), lang = 'eng ')
Copy the number sample file num. traineddata that I have trained to the data file directory.
Print pytesseract. image_to_string (Image. open ('./Test/Python/t2.png'), lang = 'num ')
Special digital recognition is very accurate!
2) If you have installed tesseract-ocr in this way, you cannot use the tesseract command under Terminal to parse it and report the following error (but it is available in Python ):
Tesseract Open Source OCR Engine v3.03 with Leptonica
Error in pixReadStreamPng: function not present
Error in pixReadStream: png: no pix returned
Error in pixRead: pix not read
Error in pixGetInputFormat: pix not defined
Reading./Test/Python/t2.png as a list of filenames...
Error in fopenReadStream: file not found
Error in pixRead: image file not found: PNG
Image file. PNG cannot be read!
Error during processing.
It is said on the Internet that Leptonica does not know png, tif, and jpg formats (in fact, basically they do not know any formats. Why is it based on this library ?)
(I have not solved this problem yet ?????????????????)
Bytes --------------------------------------------------------------------------------------------
[2] install from source code
1) first install leptonica: www.leptonica.org/download.html. for example, download leptonica-1.68.tar.gz
Then install the tool by using the following basic installation methods (if you are interested in leptonica's custom installation, try again ):
./Configure [build the Makefile]
Make [builds the library and shared library versions of all the progs]
Sudo make install [as root; this puts liblept. a into/usr/local/lib/and all the progs into/usr/local/bin/]
2) download Tesseract, now Tesseract is hosted on GitHub (https://github.com/tesseract-ocr ). (I have gone to googlecode without using FQ !)
Download the code from GitHub and decompress it to a directory (for example,/tmp/tesseract)
3) Installation
./Autogen. sh
./Configure
Make
Sudo make install
Sudo ldconfig
Note that the installed system is in/usr/local/bin, and the data file is in/usr/local/share/tessdata!
The following error may occur:
[1]./autogen. sh:
No aclocal sudo apt-get install automake
No libtoolize sudo apt-get install libtool
If no other tool is reported, run this tool. Ubuntu will tell you how to install it.
[2] data problems
The system generated by source code make has no data. At least one data packet (generally eng) must be installed to run the system. Installation Method:
Download the data packet and decompress it to/usr/local/share/tessdata.
[3] test whether the installation is successful
First test the system installation and run tesseract. The following content indicates that the installation is successful!
Searchware @ ubuntu:/usr/local/share/tessdata $ tesseract
Usage: tesseract imagename outputbase [-l lang] [-SMS pagesegmode] [configfile...]
Pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-L lang and/or-psm pagesegmode must occur before anyconfigfile.
Single options:
-V -- version: version info
-- List-langs: list available ages for tesseract engine
The common error is that there is no language data, as shown below. This is to install the language data as mentioned earlier (it is best to install eng, and the default value is eng, and eng must be used ):
Error opening data file/usr/local/share/tessdata/eng. traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any ages!
Cocould not initialize tesseract.
Then test the file identification. The source code directory contains a phototest. tif file, which can be used for testing.
Tesseract phototest. tif test1-l eng
The common error is that Leptonica does not match, as shown below:
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error in find1_compression: function not present
Error in pixReadStreamTiff: function not present
Error in pixReadStream: tiff: no pix returned
Error in pixRead: pix not read
Unsupported image type.
I have not solved this problem, the method described on the Internet cannot be used (it is not tested on Ubuntu 14.04 )??????????????????????????? ?????