Detailed explanation of Python verification code recognition

Source: Internet
Author: User
The verification code needs to be recognized when writing a program these days, because the program is written in Python and naturally intends to use Python to identify the verification code. The following describes how to implement the Program on the script home platform. If you are interested, you can refer to a popular tool that has previously been written to the School Intranet, Java (and Java programs will no longer work in the future ), the verification code is used for identification. The verification code is not written by myself:-) on-campus verification is a completely monochrome verification code without any scratches. It is easy to identify, however, you can see the basic verification code recognition method from that section of the Code. The verification code needs to be recognized when writing a program these days, because the program is written in Python and naturally intends to use Python to identify the verification code.

I have never used Python to process images before. I don't know much about PIL (Python Image Library) usage. I have looked at PIL over the past few days and found that it is so powerful that it is comparable to ImageMagic and PS. (Here are some good PIL documents)

Because the above verification code is a 24-bit jpeg image and contains noise, we need to perform de-noise and de-color. I took PS to find a verification code and tried it, the de-noise effect in the PS filter is good, but the de-noise function was not found in PIL. Later, it was found that most of the noise can be removed after the mid-value filter, and there are ready-made functions in PIL, next, I tried to directly convert the image to a single color. The result showed that there was still some noise, because a lot of noise was diluted during the mid-value filtering, however, when the noise is converted to a tone, the noise is enhanced and displayed. Therefore, after the median filter, the brightness of the image is enhanced and then converted to a monochrome color, in this way, the verification code image becomes easier to recognize:

The above processing uses Python only a few lines:

im = Image.open(image_name)im = im.filter(ImageFilter.MedianFilter())enhancer = ImageEnhance.Contrast(im)im = enhancer.enhance(2)im = im.convert('1')im.show()

Next we will extract the fonts of these numbers, use shell scripts to download 100 images, and extract three images to obtain the fonts:

#!/usr/bin/env python#encoding=utf-8import Image,ImageEnhance,ImageFilterimport sysimage_name = "./images/81.jpeg"im = Image.open(image_name)im = im.filter(ImageFilter.MedianFilter())enhancer = ImageEnhance.Contrast(im)im = enhancer.enhance(2)im = im.convert('1')#im.show()#all by pixels = 12 #start postion of first numberw = 10 #width of each numberh = 15 #end postion from topt = 2 #start postion of topim_new = []#split four numbers in the picturefor i in range(4):im1 = im.crop((s+w*i+i*2,t,s+w*(i+1)+i*2,h))im_new.append(im1)f = file("data.txt","a")for k in range(4):l = []#im_new[k].show()for i in range(13):for j in range(10):if (im_new[k].getpixel((j,i)) == 255):l.append(0)else:l.append(1)f.write("l=[")n = 0for i in l:if (n%10==0):f.write("/n")f.write(str(i)+",")n+=1f.write("]/n")

Save the modulo as list for subsequent matching;

After the model is extracted, the rest is to match the image to be processed with the model in the database. The basic idea is to look at the overlap of the corresponding points, however, due to the impact of noise, matching (6, 8) (8, 3) (5, 9) is prone to errors. I have analyzed the data collection of the existing 100 images, two-way matching is adopted (the image and the model are used as the basis points respectively). After a half-day test, the recognition rate can be 100%.

#!/usr/bin/env python#encoding=utf-8import Image,ImageEnhance,ImageFilterimport DataDEBUG = Falsedef d_print(*msg):global DEBUGif DEBUG:for i in msg:print i,printelse:passdef Get_Num(l=[]):min1 = []min2 = []for n in Data.N:count1=count2=count3=count4=0if (len(l) != len(n)):print "Wrong pic"exit()for i in range(len(l)):if (l[i] == 1):count1+=1if (n[i] == 1):count2+=1for i in range(len(l)):if (n[i] == 1):count3+=1if (l[i] == 1):count4+=1d_print(count1,count2,count3,count4)min1.append(count1-count2)min2.append(count3-count4)d_print(min1,"/n",min2)for i in range(10):if (min1[i] <= 2 or min2[i] <= 2):if ((abs(min1[i] - min2[i])) <10):return ifor i in range(10): if (min1[i] <= 4 or min2[i] <= 4):if (abs(min1[i] - min2[i]) <= 2):return ifor i in range(10):flag = Falseif (min1[i] <= 3 or min2[i] <= 3):for j in range(10):if (j != i and (min1[j] <5 or min2[j] <5)):flag = Trueelse:passif (not flag):return ifor i in range(10): if (min1[i] <= 5 or min2[i] <= 5):if (abs(min1[i] - min2[i]) <= 10):return ifor i in range(10):if (min1[i] <= 10 or min2[i] <= 10):if (abs(min1[i] - min2[i]) <= 3):return i#end of function Get_Numdef Pic_Reg(image_name=None):im = Image.open(image_name)im = im.filter(ImageFilter.MedianFilter())enhancer = ImageEnhance.Contrast(im)im = enhancer.enhance(2)im = im.convert('1')im.show()#all by pixels = 12 #start postion of first numberw = 10 #width of each numberh = 15 #end postion from topt = 2 #start postion of topim_new = []#split four numbers in the picturefor i in range(4):im1 = im.crop((s+w*i+i*2,t,s+w*(i+1)+i*2,h))im_new.append(im1)s = ""for k in range(4):l = []#im_new[k].show()for i in range(13):for j in range(10):if (im_new[k].getpixel((j,i)) == 255):l.append(0)else:l.append(1)s+=str(Get_Num(l))return sprint Pic_Reg("./images/22.jpeg")

Here we will introduce the basic methods for verification code recognition: binarization, median filtering, noise reduction, segmentation, tightening and shuffling (to ensure high and low Uniformity), and Character Library matching and identification.
Here is only for general verification codes. The identification of advanced verification codes is a good article. If it is too complicated, there will be more things involved, so I will not be interested in it, artificial Intelligence (horrible), I only like simple things.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.