Install pytesser in python_ubuntu 12.04 for OCR

Source: Internet
Author: User

Pytesser calls tesseract. Therefore, tesseract must be installed. leptonica must be installed to install tesseract. Otherwise, "Configure: Error: leptonica not found" will appear during tesseract compilation ".
The following are the old steps for decompressing, compiling, and installing:
./Configure
Make-J4
Sudo make install
Download and install leptonica

Http://www.leptonica.org/download.html

Or

Http://code.google.com/p/leptonica/downloads/list

Leptonica-1.69.tar.bz2
Download and install tesseract

Http://code.google.com/p/tesseract-ocr/

The latest is tesseract-ocr-3.02.02.tar.gz
Download the language package for tesseract Installation

Http://code.google.com/p/tesseract-ocr/downloads/list

The latest is tesseract-ocr-3.01.eng.tar.gz
Decompress the files in the tessdata directory (9 files) to the "/usr/local/share/tessdata" directory.
Note: This URL is downloaded to only one, can not be used, use will report an error, http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz
Download and install pytesser

Http://code.google.com/p/pytesser/

The latest version is pytesser_v0.0.1.zip.
Test pytesser
Go to the installation directory of pytesser and create a test. py and Python test. py to view the result.
From pytesser import * # im = image. open ('fnord. TIF ') # im = image. open ('phototest. TIF ') # im = image. open ('eurotext. TIF ') Im = image.open('fonts_test.png') TEXT = image_to_string (IM) print text
The Tesseract directory contains other TIF files, which can also be copied and tested. The TIF and PNG files tested above correctly recognize the text.
Pytesser's verification code recognition capability is relatively low. It can only recognize regular numbers and letter verification codes. I tested the verification codes of several websites and displayed the empty page. It seems that it is hopeless to identify the verification codes.
The test finds that increasing the contrast can improve the recognition accuracy.
Enhancer = imageenhance. Contrast (IM) Im = enhancer. Enhance (4)
Refer:

Http://www.oschina.net/question/54100_59400

Http://ubuntuforums.org/showthread.php? P = 10248384

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.