Installation of Pytesseract in Linux environment and identification of login verification code of Central Bank Credit Center

Source: Internet
Author: User

The first is the installation, I refer to this http://blog.csdn.net/xinghun_4/article/details/47860645

I'm CentOS, using Yum

Yum install python-devel libjpeg libjpeg-devel freetype freetype-devel zlib zlib-devel littlecms littlecms-devel libwebp L Ibwebp-devel libfreetype libfreetype-devel giflib-devel automake libtool

Tesseract installation Package I downloaded the 3.0.4, the installation of the prompt with the Leptonica version must be 1.7.2, so you can not use 1.6.9 Leptonica, this should be noted.

Some of the central bank's letter of credit verification code example, which is

You can see that the handwriting is quite neat, but it is almost impossible to use image_to_string directly, but only a few images can be converted to output text.

Need to do some processing, find the law can be found that these noises are some dim pixels, you can remove them (is converted to white).

#!/root/miniconda3/envs/crcc/bin/python2.7
#coding =utf-8

Import Pytesseract
From PIL import Image
Import re

Threshold = 140
Table = []

Name= ' Test '
For I in range (256):
If I < threshold:
Table.append (0)
Else
Table.append (1)


def pic2text (name):
Im=image.open (name+ '. jpg ')
Imgry = Im.convert (' L ') #灰化
out = imgry.point (table, ' 1 ') #二值化
Out.save (name+ ' b.jpg ')
# i = Image.open (name+ ' b.jpg ')
# i.show ()
Text= pytesseract.image_to_string (out)
Print text
Text2=re.sub (' [^a-z0-9] ', ', ', text)
Return Text2


If __name__== "__main__":
Print Pic2text (' pictures/150656820893 ')

This is the remote Linux environment that PYCHARMM calls, and if you want to execute it directly in Linxu./do_yzm.py, then you need to indicate the interpreter path in the first line of the code, and then modify the permissions of the do_yzm.py to the executable.

The actual recognition rate of almost 95%, the effect is also possible. If it is wrong, then change the verification code to log on.

This is the original.

Threshold this value to set reasonable, set too big, those noise all become black.

The setting is also not good, although the noise is removed, but the letter will be mutilated.

This will not be recognized.

This is the result of setting 140



Installation of Pytesseract in Linux environment and identification of login verification code of Central Bank Credit Center

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.