Verification code Identification __python of Python machine learning

Source: Internet
Author: User

Verification code identification of Python classification model download Verification Code image processing two-valued original diagram declaration image class cutting picture annotation picture generating training set matrix CSV file validating training set training model identifying and calculating verification code endnotes

Verification Code recognition of Python classification model Download Verification Code

First, we download a sufficient number of validation codes from the target site to make the training set and generate the model through the training set. Here take the Shenzhen credit Network for example, download 500 verification code.

Def download_image (): "" "
    download Captcha image" "
    url = ' http://www.szcredit.org.cn/web/WebPages/Member/ Checkcode.aspx '
    headers = {' User-agent ': ("mozilla/5.0" (Macintosh;  Intel Mac OS X 10_8_0) "
                              " applewebkit/536.3 (khtml, like Gecko) "
                              chrome/19.0.1063.0 safari/536.3")}
    Range (0):
        image = Img.download_image (Url=url, headers=headers)
        with open (' Source_image/%s.png '% str (i), ' WB ' as F:
            f.write (image)
            f.close ()
image Processing

Image processing is the most complicated and difficult point in the verification code recognition process, and it usually needs to figure out some algorithm logic to cut the verification code. two value of the original map

Before image processing, we first perform binary operations on all the original CAPTCHA images.

#-*-Coding:utf-8-*-
# python 3.6

from OS import listdir
from PIL import Image
import Img      # Custom Packages 
  def Two_value (): "" "" "" to the
    training set of all the original map to the binary "" "

    file_list = Listdir (' source_image/') for every in
    file_list:
        Image = Image.open (' source_image/%s '% each)
        image = Img.twovalueimage (image,)
        image.save (' Two_value_ image/%s '% each)

if __name__ = = ' __main__ ':
    two_value ()
declaring an image class

Second, the need for image cutting, but before cutting, we first define a class of verification code, image processing process is through this class method to achieve.

#-*-Coding:utf-8-*-# python 3.6 from OS import listdir from PIL import Image import IMG # Custom package class Sz_captch A: "" "Captcha of Sz_credit.org" "Def __init__ (self, image):" "Initializes the CAPTCHA, declares the following properties:p Aram Image:pil Image Object "" "Self.image = image # image itself self.size = image.size # picture size self.all_ch Unks = [] # all cubes self.all_format_chunks = [] # all the block Def attributes (self) after the dimension is redefined: "" To get the type of the picture and the cutting line Horizontal axis Position "" "if self.size[0] = = 90:self.type = ' 11 ' # ' 11 ': single digit + single-digit Self.node = (0, 19, 39  Elif self.size[0] = = 120:self.type = ' 22 ' # ' 22 ': 10 digits + 10 digits Self.node = (0, 17, 30,  Else:self.type = ' 12 ' # ' 21 ': 10 digits + single digit, ' 12 ': Single-digit + 10 digits Self.node = (0, 19, Two_value_image = Img.twovalueimage (Image=self.image, g=200) for J in range (self.size [1]): g= Two_value_image.getpixel ((j)) # Cycling The color of the 40th column, the Black is judged to be ' 21 ', otherwise ' if g = = 0:self . Type = ' Self.node ' = (0, a, m) Break def crop (self): "" "According to the position of the horizontal axis cutting, will cut the picture to save into the self.all_chunks." "" For I in Range (len (self.node)-1): img = Self.image.crop (Self.node[i], 0, Self.node[i + 1], 30) ) Self.all_chunks.append (img) If Self.type = = ' one ' or Self.type = ' one ': Self.symbol = sel F.all_chunks[1] Else:self.symbol = self.all_chunks[2] def format (self): "" Will Self.all_chu
            Nks the pictures in the binary, go to the border, ring-cut, redefine the dimensions, and save to the Self.all_format_chunks "" for I, each in enumerate (self.all_chunks):
            Two_value_image = Img.twovalueimage (each) Remove_frame = Img.clear_frame (two_value_image, 1) Cut_around = Img.cut_around (remove_frame) new_img = Img.format_size (Cut_around, 20, 30) Self.all_format_chunks.append (NEW_IMG) def recognize (self, model): "" "from Self.all_format_chunks

        To identify the image, you need to pass in the model path. :p Aram Model STR, PATH generated by Sklearn: return four digits and an operator "" "result = [] for each_img in SEL  f.all_format_chunks:x = Img.classify (each_img, Model=model) result.append (x) if Self.type = = ' One ': Result.insert (2, 0) result.insert (0, 0) elif Self.type = = ': ResU Lt.insert (0, 0) elif Self.type = = ': Result.insert (3, 0) else:pass ret Urn result # [x1, x2, Symbol, X3, x4] def calculate (self, model): "" "" "for the Verification code to do mathematical calculations" "" Self.attribu TES () # Extract Properties Self.crop () # Cut Self.format () # format cut x1, x2, Symbol, x3, x4 = self. Recognize (Model=model) if symbol.upper () = = ' X ': result = (int (x1) * ten + int (x2) * 1) * (int (x3) * Ten + INT (x4) * 1) Else:result = (int (x1) * ten + int (x2) * 1) + (int (x3) * + int (x4) * 1) return result

Now that the class of the picture is already in place, the next image processing operation will be based on this class. Cutting Pictures

After the two-valued picture is cut, and the small figure after all the cut to redefine the size, so that their size as large, the purpose of this is to unify the feature matrix for mathematical calculation. Here we unify to 20 * 30 size and need to be preserved.

#-*-Coding:utf-8-*-
# python 3.6

from OS import listdir
from PIL import Image
import Img      # Custom Packages 
  def crop (): "" "
    walk through the binary graph, cut all the pictures, and save the pictures. "" "

    file_list = Listdir (' two_value_image/') for each in
    file_list:
        img = image.open (' two_value_image/%s ') % each)
        img = Sz_captcha (img)
        img.attributes ()  # Extract Properties
        Img.crop ()  # cutting
        Img.format ()  # Format cut
        for I, A in enumerate (img.all_format_chunks):
            a.save (' train_image/%s '% (str (i) + each))

if __name__ = = ' __main__ ':
    crop ()
Callout Picture

This is a tedious step, but it's important to rename the picture, and the first character to be named needs to be the character that the picture expresses, as shown in the figure:

Generate training Set matrix CSV file

The CSV file is generated to facilitate the generation of matrices for easy training models.

#-*-Coding:utf-8-*-
# python 3.6

from OS import listdir
from PIL import Image
import Img      # Custom Packages 
  def create_train_csv (): ""
Generate training Set "" "

file_list = Listdir (' train_image/') for each in
file_list:
    img = Image.open (' train_image/%s '% each)
    x = Img.two_value (Img, ' list ')
    y = each[0]
    x.insert (0, y)
    Img.write_csv (filename= ' train_csv.csv ', values=x)

if __name__ = ' __main__ ':
    create_train_csv ()
Validation Training Set

Cross-validation requires training sets to be divided into training sets and test sets to verify the correct rate of recognition.

#-*-Coding:utf-8-*-# python 3.6 def verify (): "" cross-validation to verify the accuracy of the training set. "" "Import CSV import sklearn.cross_validation as cross_validation from Sklearn import Neighbors import s
    Klearn.metrics as Metrics csvfile = open (' Train_csv.csv ', ' r ') reader = Csv.reader (csvfile) # Read feature information and result information FeatureList = [] Labellist = [] for row in Reader:labelList.append (Row[0]) Featurelist.append (r Ow[1:] # The original information is divided into 8:2 training sets and test sets Train_data, Test_data, train_target, test_target = Cross_validation.train_test_spli T (featurelist, Labellist, test_size=0.2, random_state=0) # Enter the default model KNN = Neighbors. Kneighborsclassifier () # Training Model Knn.fit (Train_data, Train_target) # predictive Test Set predict_test = Knn.predict (test_da TA) # Realistic Predictive results print (Metrics.classification_report (Test_target, predict_test)) if __name__ = = ' __main__ ': verif      Y () [print] precision recall F1-score support + 1.001.00 1.00 39 0 1.00 1.00 1.00 14 1 1.00 1.00-1.00
          37 2 1.00 1.00 1.00 21 3 0.91 1.00 0.95 10      4 1.00 1.00 1.00 12 5 0.83 1.00 0.91 10 6-1.00 1.00 1.00 12 7 1.00 1.00 1.00 11 8 1.00 1.00-1 9 1.00 0.80 0.89 x 1.00 1.00 1.00
 Vg/total 0.99 0.99 0.99 254

From the results you can see that the training set of the correct 99%, test samples of 254, if the correct rate of doubt, the score can be divided by 7:3 or 6:4. Training Model

After determining the accuracy of the training set to meet the requirements, then start training the classification model, save the model

#-*-Coding:utf-8-*-
# python 3.6

from OS import listdir
from PIL import Image
import Img      # Custom Packages 
  def Create_model (): "" "" "" "" "" "from

    sklearn import neighbors from       
    sklearn.externals import joblib

    train_x, train_y = Img.loadtrainset (' train_csv.csv ')
    knn_cly = neighbors. Kneighborsclassifier ()      # Here, select the KNN classification model, or you can select other classification models.
    Knn_cly.fit (train_x, train_y)               # Training Data
    joblib.dump (knn_cly, "classify_model.m")   # Py2 need to train and save the

if __name__ = = ' __main__ ': Create_model in Py2 environment
    ()
identify and compute the verification code

The following code calls the original diagram for calculation, displays the picture, prints the results, and can see if the calculation is correct by using Debug.

#-*-Coding:utf-8-*-
# python 3.6

from OS import listdir
from PIL import Image
import Img      # Custom Packages 
  if __name__ = = ' __main__ ':
    file_list = Listdir (' source_image/') for each in
    file_list:
        image = Image.open ( ' source_image/%s '% each)
        image.show ()
        img = Sz_captcha (image=image)
        print (Img.calculate ' CLASSIFY_MODEL.M '))
Tail Note

To this end, a verification code recognition is basically completed, after observation, the accuracy rate of almost 100%.
In the end we only need to save img.py, sz_credit.py, classify_model.m three files, while sz_credit.py in addition to class Sz_captcha, other functions can be deleted, can also be retained as a diary, The production environment is not used anyway.
Complete code Resources Please click here to download

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.