Using Python to crack verification code examples to explain _python

Source: Internet
Author: User
Tags md5

First, the preface

This experiment will use a simple example to explain the principle of crack verification code, will learn and practice the following knowledge points:

Basic knowledge of Python

The use of PIL modules

Second, detailed examples

To install the pillow (PIL) Library:

$ sudo apt-get update

$ sudo apt-get install Python-dev

$ sudo apt-get install Libtiff5-dev Libjpeg8-dev-zlib1g-de V \
Libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk

$ sudo pip install Pillow

Download the experimental files:

$ wget http://labfile.oss.aliyuncs.com/courses/364/python_captcha.zip
$ unzip python_captcha.zip
$ cd Python_captcha

This is the verification code used in our experiment Captcha.gif


Extract text pictures

Create a new crack.py file in the working directory to edit it.

#-*-Coding:utf8-*-from
pil import image

im = Image.open ("Captcha.gif")
# (convert picture to 8-bit pixel mode)
im = Im.convert ("P")

#打印颜色直方图
print Im.histogram ()

Output:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1.0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,  1, 2, 0, 0, 0, 1, 2, 0, 1, 0, 0, 1, 0.2, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 3, 1, 3, 3, 0, 0, 0, 1, 0, 3, 2, 132, 1, 1, 0.0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 15, 0, 1, 0, 1, 0, 0, 8, 1, 0, 0, 0, 0, 1,  0, 0, 0, 0, 18, 1, 1, 1, 1.1, 2, 365, 115, 0, 1, 0, 0, 0, 135, 186, 0, 0, 1, 0, 0, 0, 116, 3, 0, 0, 0, 0, 0, 21, 0, 0, 2, 10, 2, 0, 0, 0, 0, 2, 10, 0, 0, 0, 0, 1, 0, 625]

Each digit of the color histogram represents the number of pixels that contain the corresponding bit of color in the picture.

Each pixel can represent 256 colors, and you will find that the white dot is the most (the position of the number 255, the last one, you can see, there are 625 white pixels). Red pixel in ordinal 200, we can by sort, get useful color.

his = Im.histogram ()
values = {}

to I in range (256):
 values[i] = his[i]

for j,k in sorted (Values.items (), Key=lambda x:x[1],reverse = True) [: ten]:
 print J,k

Output:

255 625
365 212 186 219 135 169 132 116
213
184 15 234

We got the top 10 colors in the picture, of which 220 and 227 are the red and gray we need to construct a black-and-white two-value picture.

#-*-Coding:utf8-*-from
pil import Image

im = Image.open ("captcha.gif")
im = Im.convert ("P")
im2 = Image.new ("P", im.size,255) for


x in range (Im.size[1]): For
 y in range (im.size[0]):
  pix = Im.getpixel (y , x))
  if pix = = or pix = =: # These are the numbers to get
   im2.putpixel ((y,x), 0)

im2.show ()

The results obtained:


Extract a single character picture

The next thing to do is to get a set of pixels for a single character, and because the example is simpler, we cut it vertically:

Inletter = False
foundletter=false
start = 0 End
= 0

letters = [] for

y in range (im2.size[0]): 
 fo R x in range (Im2.size[1]):
  pix = Im2.getpixel ((y,x))
  if Pix!= 255:
   inletter = True
 if foundletter = Fa LSE and Inletter = = true:
  Foundletter = true
  start = y

 if Foundletter = = true and Inletter = = False:
  F Oundletter = False End
  = y
  letters.append ((start,end))

 inletter=false
Print Letters

Output:

[(6, 14), (15, 25), (27, 35), (37, 46), (48, 56), (57, 67)]

Gets the column ordinal of the start and end of each character.

Import hashlib
Import time

count = 0 for letter in
letters:
 m = hashlib.md5 ()
 im3 = Im2.crop ((letter[ 0], 0, letter[1],im2.size[1])
 m.update ("%s%s"% (Time.time (), count))
 Im3.save ("./%s.gif"% (M.hexdigest ()))
 Count + 1

(followed by the code above)

Cut the picture to get the part of the picture where each character is located.

AI and vector space image recognition

Here we use vector space search engine to do character recognition, it has many advantages:

    1. No need for a lot of training iterations
    2. No training over.
    3. You can add/remove incorrect data viewing effects at any time
    4. Easy to understand and write into code
    5. Provides rating results, you can view the closest multiple matches
    6. For things that can't be identified, they can be identified as soon as they are added to the search engine.

Of course, it also has drawbacks, such as the speed of classification is much slower than the neural network, it can not find their own way to solve problems and so on.

Vector space search engine name sounds very tall in fact, the principle is very simple. Take the example in the article:

You have 3 documents, how do we calculate the similarity between them? The more the same words are used in 2 documents, the more similar the two articles are! But this word too much how to do, we choose a few key words, the choice of words are called characteristics, each feature is like a dimension in space (X,y,z, etc.), a set of characteristics is a vector, each document we can get such a vector, As long as the angle between the calculation vector can get the similarity of the article.

Implementing vector space with Python classes:

Import Math

class Vectorcompare:
 #计算矢量大小
 def magnitude (self,concordance): Total
  = 0
  for Word, Count in Concordance.iteritems (): Total
   = Count * * 2 return
  math.sqrt (total)

 #计算矢量之间的 cos value
 def Relation (Self,concordance1, Concordance2):
  relevance = 0
  topvalue = 0
  for Word, count in Concordance1.iteritems ():
   if Concordance2.has_key (word):
    Topvalue + = Count * Concordance2[word]
  Return Topvalue/(Self.magnitude (Concordance1) * Self.magnitude (CONCORDANCE2))

It compares two Python dictionary types and outputs their similarity (represented by 0~1 numbers).

Put the previous content together

There is a large number of verification code to extract a single character picture as a training set of work, but as long as the students read the above must know how to do this work, here is omitted. You can use the provided training set directly to perform the following actions.

The IconSet directory is our training set.

Last additions:

#将图片转换为矢量
def buildvector (IM):
 d1 = {}
 count = 0 for
 i in Im.getdata ():
  D1[count] = i
  count + = 1
   return D1

v = vectorcompare ()

iconset = [' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ', ' 7 ', ' 8 ', ' 9 ', ' 0 ', ' a ', ' B ', ' C ', ' d ', ' e ' , ' f ', ' g ', ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ', ' Q ', ' R ', ' s ', ' t ', ' u ', ' V ', ' w ', ' x ', ' y ', ' z ']

#加载训练集
ImageSet = [] for letter in
iconset: For
 img in Os.listdir ('./iconset/%s/'% (letter)):
  temp = []
  if IM G!= "Thumbs.db" and img!= ". Ds_store ":
   temp.append (Buildvector (Image.open ("./iconset/%s/%s "% (letter,img)))
  Imageset.append ({ Letter:temp})


count = 0 #对验证码图片进行切割 for letter in
letters:
 m = hashlib.md5 ()
 im3 = Im2.crop ( Letter[0], 0, letter[1],im2.size[1])

 guess = []

 #将切割得到的验证码小片段与每个训练片段进行比较 for
 image in ImageSet:
  For X,y in Image.iteritems ():
   If Len (y)!= 0:
    guess.append (V.relation (Y[0],buildvector (IM3)), X)

 Guess.sort (reverse=true)
 print "", guess[0]
 count = 1

Get the results

Everything is ready to run our code to try:

Python crack.py

Output

(0.96376811594202894, ' 7 ')
(0.96234028545977002, ' s ')
(0.9286884286888929, ' 9 ')
(0.98350370609844473, ' t ')
(0.96751165072506273, ' 9 ')
(0.96989711688772628, ' J ')

It's a positive solution, nice work.

Summarize

The above is the entire content of this article, I hope the content of this article for everyone's study or work can bring certain help, if you have questions you can message exchange.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.