Verification code identification of Python classification model download Verification Code image processing two-valued original diagram declaration image class cutting picture annotation picture generating training set matrix CSV file validating training set training model identifying and calculating verification code endnotes
Verification Code recognition of Python classification model Download Verification Code
First, we download a sufficient number of validation codes from the target site to make the training set and generate the model through the training set. Here take the Shenzhen credit Network for example, download 500 verification code.
Def download_image (): "" "
download Captcha image" "
url = ' http://www.szcredit.org.cn/web/WebPages/Member/ Checkcode.aspx '
headers = {' User-agent ': ("mozilla/5.0" (Macintosh; Intel Mac OS X 10_8_0) "
" applewebkit/536.3 (khtml, like Gecko) "
chrome/19.0.1063.0 safari/536.3")}
Range (0):
image = Img.download_image (Url=url, headers=headers)
with open (' Source_image/%s.png '% str (i), ' WB ' as F:
f.write (image)
f.close ()
image Processing
Image processing is the most complicated and difficult point in the verification code recognition process, and it usually needs to figure out some algorithm logic to cut the verification code. two value of the original map
Before image processing, we first perform binary operations on all the original CAPTCHA images.
#-*-Coding:utf-8-*-
# python 3.6
from OS import listdir
from PIL import Image
import Img # Custom Packages
def Two_value (): "" "" "" to the
training set of all the original map to the binary "" "
file_list = Listdir (' source_image/') for every in
file_list:
Image = Image.open (' source_image/%s '% each)
image = Img.twovalueimage (image,)
image.save (' Two_value_ image/%s '% each)
if __name__ = = ' __main__ ':
two_value ()
declaring an image class
Second, the need for image cutting, but before cutting, we first define a class of verification code, image processing process is through this class method to achieve.
#-*-Coding:utf-8-*-# python 3.6 from OS import listdir from PIL import Image import IMG # Custom package class Sz_captch A: "" "Captcha of Sz_credit.org" "Def __init__ (self, image):" "Initializes the CAPTCHA, declares the following properties:p Aram Image:pil Image Object "" "Self.image = image # image itself self.size = image.size # picture size self.all_ch Unks = [] # all cubes self.all_format_chunks = [] # all the block Def attributes (self) after the dimension is redefined: "" To get the type of the picture and the cutting line Horizontal axis Position "" "if self.size[0] = = 90:self.type = ' 11 ' # ' 11 ': single digit + single-digit Self.node = (0, 19, 39 Elif self.size[0] = = 120:self.type = ' 22 ' # ' 22 ': 10 digits + 10 digits Self.node = (0, 17, 30, Else:self.type = ' 12 ' # ' 21 ': 10 digits + single digit, ' 12 ': Single-digit + 10 digits Self.node = (0, 19, Two_value_image = Img.twovalueimage (Image=self.image, g=200) for J in range (self.size [1]): g= Two_value_image.getpixel ((j)) # Cycling The color of the 40th column, the Black is judged to be ' 21 ', otherwise ' if g = = 0:self . Type = ' Self.node ' = (0, a, m) Break def crop (self): "" "According to the position of the horizontal axis cutting, will cut the picture to save into the self.all_chunks." "" For I in Range (len (self.node)-1): img = Self.image.crop (Self.node[i], 0, Self.node[i + 1], 30) ) Self.all_chunks.append (img) If Self.type = = ' one ' or Self.type = ' one ': Self.symbol = sel F.all_chunks[1] Else:self.symbol = self.all_chunks[2] def format (self): "" Will Self.all_chu
Nks the pictures in the binary, go to the border, ring-cut, redefine the dimensions, and save to the Self.all_format_chunks "" for I, each in enumerate (self.all_chunks):
Two_value_image = Img.twovalueimage (each) Remove_frame = Img.clear_frame (two_value_image, 1) Cut_around = Img.cut_around (remove_frame) new_img = Img.format_size (Cut_around, 20, 30) Self.all_format_chunks.append (NEW_IMG) def recognize (self, model): "" "from Self.all_format_chunks
To identify the image, you need to pass in the model path. :p Aram Model STR, PATH generated by Sklearn: return four digits and an operator "" "result = [] for each_img in SEL f.all_format_chunks:x = Img.classify (each_img, Model=model) result.append (x) if Self.type = = ' One ': Result.insert (2, 0) result.insert (0, 0) elif Self.type = = ': ResU Lt.insert (0, 0) elif Self.type = = ': Result.insert (3, 0) else:pass ret Urn result # [x1, x2, Symbol, X3, x4] def calculate (self, model): "" "" "for the Verification code to do mathematical calculations" "" Self.attribu TES () # Extract Properties Self.crop () # Cut Self.format () # format cut x1, x2, Symbol, x3, x4 = self. Recognize (Model=model) if symbol.upper () = = ' X ': result = (int (x1) * ten + int (x2) * 1) * (int (x3) * Ten + INT (x4) * 1) Else:result = (int (x1) * ten + int (x2) * 1) + (int (x3) * + int (x4) * 1) return result
Now that the class of the picture is already in place, the next image processing operation will be based on this class. Cutting Pictures
After the two-valued picture is cut, and the small figure after all the cut to redefine the size, so that their size as large, the purpose of this is to unify the feature matrix for mathematical calculation. Here we unify to 20 * 30 size and need to be preserved.
#-*-Coding:utf-8-*-
# python 3.6
from OS import listdir
from PIL import Image
import Img # Custom Packages
def crop (): "" "
walk through the binary graph, cut all the pictures, and save the pictures. "" "
file_list = Listdir (' two_value_image/') for each in
file_list:
img = image.open (' two_value_image/%s ') % each)
img = Sz_captcha (img)
img.attributes () # Extract Properties
Img.crop () # cutting
Img.format () # Format cut
for I, A in enumerate (img.all_format_chunks):
a.save (' train_image/%s '% (str (i) + each))
if __name__ = = ' __main__ ':
crop ()
Callout Picture
This is a tedious step, but it's important to rename the picture, and the first character to be named needs to be the character that the picture expresses, as shown in the figure:
Generate training Set matrix CSV file
The CSV file is generated to facilitate the generation of matrices for easy training models.
#-*-Coding:utf-8-*-
# python 3.6
from OS import listdir
from PIL import Image
import Img # Custom Packages
def create_train_csv (): ""
Generate training Set "" "
file_list = Listdir (' train_image/') for each in
file_list:
img = Image.open (' train_image/%s '% each)
x = Img.two_value (Img, ' list ')
y = each[0]
x.insert (0, y)
Img.write_csv (filename= ' train_csv.csv ', values=x)
if __name__ = ' __main__ ':
create_train_csv ()
Validation Training Set
Cross-validation requires training sets to be divided into training sets and test sets to verify the correct rate of recognition.
#-*-Coding:utf-8-*-# python 3.6 def verify (): "" cross-validation to verify the accuracy of the training set. "" "Import CSV import sklearn.cross_validation as cross_validation from Sklearn import Neighbors import s
Klearn.metrics as Metrics csvfile = open (' Train_csv.csv ', ' r ') reader = Csv.reader (csvfile) # Read feature information and result information FeatureList = [] Labellist = [] for row in Reader:labelList.append (Row[0]) Featurelist.append (r Ow[1:] # The original information is divided into 8:2 training sets and test sets Train_data, Test_data, train_target, test_target = Cross_validation.train_test_spli T (featurelist, Labellist, test_size=0.2, random_state=0) # Enter the default model KNN = Neighbors. Kneighborsclassifier () # Training Model Knn.fit (Train_data, Train_target) # predictive Test Set predict_test = Knn.predict (test_da TA) # Realistic Predictive results print (Metrics.classification_report (Test_target, predict_test)) if __name__ = = ' __main__ ': verif Y () [print] precision recall F1-score support + 1.001.00 1.00 39 0 1.00 1.00 1.00 14 1 1.00 1.00-1.00
37 2 1.00 1.00 1.00 21 3 0.91 1.00 0.95 10 4 1.00 1.00 1.00 12 5 0.83 1.00 0.91 10 6-1.00 1.00 1.00 12 7 1.00 1.00 1.00 11 8 1.00 1.00-1 9 1.00 0.80 0.89 x 1.00 1.00 1.00
Vg/total 0.99 0.99 0.99 254
From the results you can see that the training set of the correct 99%, test samples of 254, if the correct rate of doubt, the score can be divided by 7:3 or 6:4. Training Model
After determining the accuracy of the training set to meet the requirements, then start training the classification model, save the model
#-*-Coding:utf-8-*-
# python 3.6
from OS import listdir
from PIL import Image
import Img # Custom Packages
def Create_model (): "" "" "" "" "" "from
sklearn import neighbors from
sklearn.externals import joblib
train_x, train_y = Img.loadtrainset (' train_csv.csv ')
knn_cly = neighbors. Kneighborsclassifier () # Here, select the KNN classification model, or you can select other classification models.
Knn_cly.fit (train_x, train_y) # Training Data
joblib.dump (knn_cly, "classify_model.m") # Py2 need to train and save the
if __name__ = = ' __main__ ': Create_model in Py2 environment
()
identify and compute the verification code
The following code calls the original diagram for calculation, displays the picture, prints the results, and can see if the calculation is correct by using Debug.
#-*-Coding:utf-8-*-
# python 3.6
from OS import listdir
from PIL import Image
import Img # Custom Packages
if __name__ = = ' __main__ ':
file_list = Listdir (' source_image/') for each in
file_list:
image = Image.open ( ' source_image/%s '% each)
image.show ()
img = Sz_captcha (image=image)
print (Img.calculate ' CLASSIFY_MODEL.M '))
Tail Note
To this end, a verification code recognition is basically completed, after observation, the accuracy rate of almost 100%.
In the end we only need to save img.py, sz_credit.py, classify_model.m three files, while sz_credit.py in addition to class Sz_captcha, other functions can be deleted, can also be retained as a diary, The production environment is not used anyway.
Complete code Resources Please click here to download