Using Python to crack verification code examples to explain

Using Python to crack verification code examples to explain _python

Last Update:2017-01-18 Source: Internet

Author: User

Tags md5

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the preface

This experiment will use a simple example to explain the principle of crack verification code, will learn and practice the following knowledge points:

Basic knowledge of Python

The use of PIL modules

Second, detailed examples

To install the pillow (PIL) Library:

$ sudo apt-get update

$ sudo apt-get install Python-dev

$ sudo apt-get install Libtiff5-dev Libjpeg8-dev-zlib1g-de V \
Libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk

$ sudo pip install Pillow

Download the experimental files:

$ wget http://labfile.oss.aliyuncs.com/courses/364/python_captcha.zip
$ unzip python_captcha.zip
$ cd Python_captcha

This is the verification code used in our experiment Captcha.gif

Extract text pictures

Create a new crack.py file in the working directory to edit it.

#-*-Coding:utf8-*-from
pil import image

im = Image.open ("Captcha.gif")
# (convert picture to 8-bit pixel mode)
im = Im.convert ("P")

#打印颜色直方图
print Im.histogram ()

Output:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1.0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,  1, 2, 0, 0, 0, 1, 2, 0, 1, 0, 0, 1, 0.2, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 3, 1, 3, 3, 0, 0, 0, 1, 0, 3, 2, 132, 1, 1, 0.0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 15, 0, 1, 0, 1, 0, 0, 8, 1, 0, 0, 0, 0, 1,  0, 0, 0, 0, 18, 1, 1, 1, 1.1, 2, 365, 115, 0, 1, 0, 0, 0, 135, 186, 0, 0, 1, 0, 0, 0, 116, 3, 0, 0, 0, 0, 0, 21, 0, 0, 2, 10, 2, 0, 0, 0, 0, 2, 10, 0, 0, 0, 0, 1, 0, 625]

Each digit of the color histogram represents the number of pixels that contain the corresponding bit of color in the picture.

Each pixel can represent 256 colors, and you will find that the white dot is the most (the position of the number 255, the last one, you can see, there are 625 white pixels). Red pixel in ordinal 200, we can by sort, get useful color.

his = Im.histogram ()
values = {}

to I in range (256):
 values[i] = his[i]

for j,k in sorted (Values.items (), Key=lambda x:x[1],reverse = True) [: ten]:
 print J,k

Output:

255 625
365 212 186 219 135 169 132 116
213
184 15 234

We got the top 10 colors in the picture, of which 220 and 227 are the red and gray we need to construct a black-and-white two-value picture.

#-*-Coding:utf8-*-from
pil import Image

im = Image.open ("captcha.gif")
im = Im.convert ("P")
im2 = Image.new ("P", im.size,255) for


x in range (Im.size[1]): For
 y in range (im.size[0]):
  pix = Im.getpixel (y , x))
  if pix = = or pix = =: # These are the numbers to get
   im2.putpixel ((y,x), 0)

im2.show ()

The results obtained:

Extract a single character picture

The next thing to do is to get a set of pixels for a single character, and because the example is simpler, we cut it vertically:

Inletter = False
foundletter=false
start = 0 End
= 0

letters = [] for

y in range (im2.size[0]): 
 fo R x in range (Im2.size[1]):
  pix = Im2.getpixel ((y,x))
  if Pix!= 255:
   inletter = True
 if foundletter = Fa LSE and Inletter = = true:
  Foundletter = true
  start = y

 if Foundletter = = true and Inletter = = False:
  F Oundletter = False End
  = y
  letters.append ((start,end))

 inletter=false
Print Letters

Output:

[(6, 14), (15, 25), (27, 35), (37, 46), (48, 56), (57, 67)]

Gets the column ordinal of the start and end of each character.

Import hashlib
Import time

count = 0 for letter in
letters:
 m = hashlib.md5 ()
 im3 = Im2.crop ((letter[ 0], 0, letter[1],im2.size[1])
 m.update ("%s%s"% (Time.time (), count))
 Im3.save ("./%s.gif"% (M.hexdigest ()))
 Count + 1

(followed by the code above)

Cut the picture to get the part of the picture where each character is located.

AI and vector space image recognition

Here we use vector space search engine to do character recognition, it has many advantages:

No need for a lot of training iterations
No training over.
You can add/remove incorrect data viewing effects at any time
Easy to understand and write into code
Provides rating results, you can view the closest multiple matches
For things that can't be identified, they can be identified as soon as they are added to the search engine.

Of course, it also has drawbacks, such as the speed of classification is much slower than the neural network, it can not find their own way to solve problems and so on.

Vector space search engine name sounds very tall in fact, the principle is very simple. Take the example in the article:

You have 3 documents, how do we calculate the similarity between them? The more the same words are used in 2 documents, the more similar the two articles are! But this word too much how to do, we choose a few key words, the choice of words are called characteristics, each feature is like a dimension in space (X,y,z, etc.), a set of characteristics is a vector, each document we can get such a vector, As long as the angle between the calculation vector can get the similarity of the article.

Implementing vector space with Python classes:

Import Math

class Vectorcompare:
 #计算矢量大小
 def magnitude (self,concordance): Total
  = 0
  for Word, Count in Concordance.iteritems (): Total
   = Count * * 2 return
  math.sqrt (total)

 #计算矢量之间的 cos value
 def Relation (Self,concordance1, Concordance2):
  relevance = 0
  topvalue = 0
  for Word, count in Concordance1.iteritems ():
   if Concordance2.has_key (word):
    Topvalue + = Count * Concordance2[word]
  Return Topvalue/(Self.magnitude (Concordance1) * Self.magnitude (CONCORDANCE2))

It compares two Python dictionary types and outputs their similarity (represented by 0~1 numbers).

Put the previous content together

There is a large number of verification code to extract a single character picture as a training set of work, but as long as the students read the above must know how to do this work, here is omitted. You can use the provided training set directly to perform the following actions.

The IconSet directory is our training set.

Last additions:

#将图片转换为矢量
def buildvector (IM):
 d1 = {}
 count = 0 for
 i in Im.getdata ():
  D1[count] = i
  count + = 1
   return D1

v = vectorcompare ()

iconset = [' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ', ' 7 ', ' 8 ', ' 9 ', ' 0 ', ' a ', ' B ', ' C ', ' d ', ' e ' , ' f ', ' g ', ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ', ' Q ', ' R ', ' s ', ' t ', ' u ', ' V ', ' w ', ' x ', ' y ', ' z ']

#加载训练集
ImageSet = [] for letter in
iconset: For
 img in Os.listdir ('./iconset/%s/'% (letter)):
  temp = []
  if IM G!= "Thumbs.db" and img!= ". Ds_store ":
   temp.append (Buildvector (Image.open ("./iconset/%s/%s "% (letter,img)))
  Imageset.append ({ Letter:temp})


count = 0 #对验证码图片进行切割 for letter in
letters:
 m = hashlib.md5 ()
 im3 = Im2.crop ( Letter[0], 0, letter[1],im2.size[1])

 guess = []

 #将切割得到的验证码小片段与每个训练片段进行比较 for
 image in ImageSet:
  For X,y in Image.iteritems ():
   If Len (y)!= 0:
    guess.append (V.relation (Y[0],buildvector (IM3)), X)

 Guess.sort (reverse=true)
 print "", guess[0]
 count = 1

Get the results

Everything is ready to run our code to try:

Python crack.py

Output

(0.96376811594202894, ' 7 ')
(0.96234028545977002, ' s ')
(0.9286884286888929, ' 9 ')
(0.98350370609844473, ' t ')
(0.96751165072506273, ' 9 ')
(0.96989711688772628, ' J ')

It's a positive solution, nice work.

Summarize

The above is the entire content of this article, I hope the content of this article for everyone's study or work can bring certain help, if you have questions you can message exchange.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More