The first is the installation, I refer to this http://blog.csdn.net/xinghun_4/article/details/47860645
I'm CentOS, using Yum
Yum install python-devel libjpeg libjpeg-devel freetype freetype-devel zlib zlib-devel littlecms littlecms-devel libwebp L Ibwebp-devel libfreetype libfreetype-devel giflib-devel automake libtool
Tesseract installation Package I downloaded the 3.0.4, the installation of the prompt with the Leptonica version must be 1.7.2, so you can not use 1.6.9 Leptonica, this should be noted.
Some of the central bank's letter of credit verification code example, which is
You can see that the handwriting is quite neat, but it is almost impossible to use image_to_string directly, but only a few images can be converted to output text.
Need to do some processing, find the law can be found that these noises are some dim pixels, you can remove them (is converted to white).
#!/root/miniconda3/envs/crcc/bin/python2.7
#coding =utf-8
Import Pytesseract
From PIL import Image
Import re
Threshold = 140
Table = []
Name= ' Test '
For I in range (256):
If I < threshold:
Table.append (0)
Else
Table.append (1)
def pic2text (name):
Im=image.open (name+ '. jpg ')
Imgry = Im.convert (' L ') #灰化
out = imgry.point (table, ' 1 ') #二值化
Out.save (name+ ' b.jpg ')
# i = Image.open (name+ ' b.jpg ')
# i.show ()
Text= pytesseract.image_to_string (out)
Print text
Text2=re.sub (' [^a-z0-9] ', ', ', text)
Return Text2
If __name__== "__main__":
Print Pic2text (' pictures/150656820893 ')
This is the remote Linux environment that PYCHARMM calls, and if you want to execute it directly in Linxu./do_yzm.py, then you need to indicate the interpreter path in the first line of the code, and then modify the permissions of the do_yzm.py to the executable.
The actual recognition rate of almost 95%, the effect is also possible. If it is wrong, then change the verification code to log on.
This is the original.
Threshold this value to set reasonable, set too big, those noise all become black.
The setting is also not good, although the noise is removed, but the letter will be mutilated.
This will not be recognized.
This is the result of setting 140
Installation of Pytesseract in Linux environment and identification of login verification code of Central Bank Credit Center