Python parses the simplest verification code.
Recently I was learning python. I used python to write a software program to grab a dormitory when I needed to select a dormitory. One module is used to log on. You need to enter the verification code when logging on. However, you can directly bypass the verification code to log on directly. However, this is another topic. At the beginning, I did not find this hidden secret, so I wrote this python code segment to implement the verification code parsing function.
Our school's verification code is the simplest Verification Code in the following format:
The size of the image is 60x24 pixels, and the size of each number is about 15x24 pixels.
After observing the verification code, we can find that the verification code only contains numbers and the font of the numbers is standard, except that the color of each number is different.
There were two ideas at that time
1. slice the entire photo into an average of four parts, each number is an image, then scan each pixel of each photo, initialize a signature buff for each number, the size is 15 bytes, that is, a total of 45 bytes.
Take the background color first. You can see that the position (0, 0) is the background color. Then, Scan each pixel of the number and compare the background color. If the difference is 1, the value is 0. Then, the feature values of the 10 characters 0-9 are analyzed. When you need to parse the verification code, you can directly split the verification code image to retrieve the feature value and compare it with the standard feature value.
2. we can imagine that the 10 characters 0-9 have different glyphs, but it is possible that the number 9 is exclusive to the position of the pixel, that is to say, if the pixels at the position () in the part image are the same as the background color, the part image must not be 9 or it must be 9.
There is a bug in the above two methods that the first number of the image has a certain offset. For example, the number at other positions starts from column 3rd and may start from column 4th, I have no specific analysis. However, there is a way to solve this problem. the method I use is to calculate from the first column of non-background color. No matter how the image is offset, the x-axis difference to the x-direction of the leftmost vertex remains unchanged.
In the end, my implementation method is based on the second method, because this method is the fastest, you only need to take the points at the feature pixel.
My method is like this. First I select three material images, which contain the 10 characters 0-9, and then check whether each pixel is consistent with the background color, if they are consistent, place the number in the hash table corresponding to the pixel.
Finally, analyze the hash table to find out which pixel is unique to one number, which pixel is unique to two numbers, and which pixel is unique to three numbers. Finally, parse the table.
Find a unique method to determine a number, such as (0, 18), (0, 19) the two numbers can uniquely determine the number 1.
Then a hash dictionary is obtained:
NumberKeyPixel={[(7,10),(0,12),(0,10),(0,11),(0,8),(1,14),(1,15)],[(4,8)],[(0,18),(0,19)], [], [(5,7)],[(0,4),(0,10)],[(2,6)],[(2,16)],[(0,12)],[(2,13)]}
When using the image, you only need to compare these pixels in sequence to determine the verification code value of the image.
The following describes the code.
1. First, the code used for analysis is used to obtain the feature pixels of numbers:
From PIL import Imageimport OS # path for storing the material image = "C: \ vaildpic \" # obtain the material image images = OS. listdir (path) stores the slices of numbers, 0-9 image nubimgs = [] # store background color backpixels = [] # store pixel corresponding table pixDir ={}# first non-Background Color Offset Value pixblkendpos = [] # This function is used to obtain this image the offset def getlastblkposition (materialPic, x = 0): bc = materialPic. getpixel (0, 0) for I in range (15): for j in range (24): if materialPic. getpixel (I + x, j ))! = Bc: return I # The parsing is not strictly written. in this case, # obtain the image in the target folder for image in images: if OS. path. isdir (path + image): continueimage = Image. open (path + image) # Cut each image into four parts and store it in the dictionary to obtain the corresponding background color. The first non-Background Color Offset x, and then calculate the for I in range (4 ): ma = image. crop (I * 15, 0, (I + 1) * 15, 24) nubimgs. append (ma) backpixels. append (image. getpixel (0, 0) pixBlankEndPos. append (GetLastBlankPosition (ma) print pixBlankEndPos # For each pixel of each digital image, if the corresponding position is not the background color, put the image in the dictionary of this position. Its structure is as follows, next, use the following number Statistics to get each number of feature pixels ''' pixDir [pixel (x-x_offset, y), imgSeq] = picture <br> ''' for I in range (15 ): for j in range (24): ai = Noneaj = NonepixDir [(I, j)] ={} for imgNum in range (nubimgs. _ len _ (): if (nubimgs [imgNum]. getpixel (I, j ))! = Backpixels [imgNum]): pixDir [(I-pixBlankEndPos [imgNum], j)] [imgNum] = nubimgs [imgNum] "" nubimgs [0]. putpixel (I, j), nubimgs [imgNum]. getpixel (I, j) "" ''' only the pixels of n numbers are saved in the corresponding folder ''' for pix in pixDir. items (): if pix [1]. _ len _ () <= 6: print pixi = 0for pic in pix [1]. items (): I + = 1if not OS. path. exists (path + str (pix [1]. _ len _ (): OS. mkdir (path + str (pix [1]. _ len _ () pic [1]. save (OS. path. join (path + str (pix [1]. _ len _ (), str (pix [0] [0]) + "_" + str (pix [0] [1]) + "_" + str (I) + ". bmp "))
Material image:
The resolution result is as follows:
The corresponding folder contains n pixels shared by images. For the next analysis, I manually analyzed them. In fact, I can also use a program to write them, but I need to tell the program which part is the number in advance, you can parse the image name as the verification code. Because this is what comes to mind later, it will not be implemented.
2. Use the obtained feature value to parse the verification code.
The following method is used to obtain the background color. The method is the same as that for surface resolution. The color is taken along the top layer of the image because the top layer is not drawn.
def getBackColors(bmp):list=[]for i in range(60):if bmp.getpixel((i,0)) not in list:list.append(bmp.getpixel((i,0)))return list
Obtain the first-line offset value, just like the above-side parsing.
def GetLastBlankPosition(materialPic,x=0):bc=getBackColors(materialPic)for i in range(15):for j in range(24):if materialPic.getpixel((i+x,j)) not in bc:return i
Parse the verification code and use the features to determine
Def GetVaildJpgNumber (bmp): print 'getvaildjpgnumber' vaildStr = ""; backColors = getBackColors (bmp) <br> # verify the four digits of a Verification code, its x range is n * 15 ~ (N + 1) * 15for pos in range (4): <br> # Get the first offset value of the corresponding position offset = getlastblkposition (bmp, pos * 15) <br> # for 0-9, determine whether the corresponding feature is the background color. If it is not the resolution completed, or the background color, determine the next number, because 3 pixels are basically shared with other images, so if no specific number is found at the end, it is 3for nr in range (): isthisNr = Truefor pix in NumberKeyPixel [nr]: if pix [0] + offset> = 15: isthisNr = Falsebreakif bmp. getpixel (pix [0] + offset + pos * 15, pix [1]) in backColors: isthisNr = Falsebreak; if isthisNr and NumberKeyPixel [nr]. _ len __() ! = 0: vaildStr + = str (nr) breakif vaildStr. _ len _ () = pos: vaildStr + = '3' print vaildStrreturn vaildStr
Capture the verification code from the network, using httplib, where my school name has been replaced by myschool
def GetVaildJpg ():print 'GetVaildJpg'headers={'Accept': 'image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5','Referer': 'http://zcc.myschool.edu.cn/','Accept-Language': 'zh-Hans-CN,zh-Hans;q=0.8,en-US;q=0.5,en;q=0.3','User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko','Accept-Encoding': 'gzip, deflate','Host': 'zcc.myschool.edu.cn','DNT': '1','Connection': 'Keep-Alive','Cookie': sessionId}httpClient=httplib.HTTPConnection('zcc.myschool.edu.cn',80,timeout=300)httpClient.request("GET",'http://zcc.myschool.edu.cn/image.jsp',None,headers)response=httpClient.getresponse()'''print response.getheaders()'''stBmp=response.read()bmp=Image.open(BytesIO(stBmp))bmp.save('D:\PROJECT\PYTHON\catchDorm\catch.bmp')'''bmp.show()'''return GetVaildJpgNumber(bmp)
The above content introduces you to the Python parsing of the simplest verification code, and I hope you will like it.
Articles you may be interested in:
- Python image Verification Code
- Python image verification code sharing
- Example code of a Chinese Verification Code randomly generated by Python
- Python adds recaptcha verification code for tornado
- Example of a random Verification Code (Chinese Verification Code) generated by python
- Python generates Verification Code instances
- Python implements the code for automatic login to websites with verification Codes
- Python generates a 6-digit Verification Code randomly.
- Python uses pil to generate an image Verification Code