Python verification code identification and processing instance

Source: Internet
Author: User

Python verification code identification and processing instance

I. Preparation and code example

1. PIL, pytesser, tesseract

 

Download is an exe, directly double-click the installation, it will automatically install to C: Python27Libsite-packages,

 

After downloading and unzipping the package, Open C: Python27Libsite-packages (depending on the Python path you have installed) and create a new pytheeer. pth, the content is written to pytesser. Note that the content here must have the same name as the pytesser folder, which means the pytesser folder and pytesser. pth, And the content must be the same!

 

Download and decompress the file. Use the tessdata folder to replace the tessdata folder decompressed by pytesser. (The above pytesser folder)

Ii. Verification

(1) Principle:

Verification Code Image Processing

Verification Code image recognition technology is mainly used to operate the pixels in the image. by performing a series of operations on the pixels in the image, the text matrix of each character in the verification code image is output.

1. Read Images
2. Image Noise Reduction
3. Image Cutting
4. Image text output


(2) Verify Character Recognition

The character recognition in the verification code is mainly implemented by machine learning classification algorithms. Currently, we use KNN (K-nearest algorithm) and SVM (SVM algorithm ), I will describe the applicable scenarios of these two algorithms in detail later.

1. Obtain the character matrix
2. Matrix entry Classification Algorithm
3. output results

 

The image to be verified is as follows:

(3) simple commands:

 

from pytesser import *image = Image.open('1.jpg')  # Open image object using PILprint image_to_string(image)     # Run tesseract.exe on image
Then run:

 

 


 

Or directly:

 

 print image_file_to_string('fnord.tif')
The same result can be output!

 

(4) complicated

The above can only be used for some relatively simple operations.

Principle: color to gray, gray to binary, binary image recognition

 

# Verification code recognition. This program can only recognize the data verification code import Image import ImageEnhance import ImageFilter import sys from pytesser import * # binarization threshold = 140 table = [] for I in range (256): if I <threshold: table. append (0) else: table. append (1) # because they are numbers # use this table to recognize letters to correct rep = {'O': '0', 'I': '1 ', 'L': '1', 'z': '2', 's': '8'}; def getverify1 (name): # Open the Image im = Image. open (name) # convert to grayscale map imgry = im. convert ('l') # Save the image imgry. save ('G' + name) # binarization, using the threshold segmentation method, threshold is the split point out = imgry. point (table, '1') out. save ('B' + name) # recognize text = image_to_string (out) # recognize text = text. strip () text = text. upper (); for r in rep: text = text. replace (r, rep [r]) Export out.save(text}'.jpg ') print text return text getverify1('1.jpg') # note that the image here must be in the same directory as the file, or upload an absolute path.

Effect after running:

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.