163 album Verification Code Image Recognition Note 2-Recognition

Source: Internet
Author: User

Statement:This article only records my thoughts on how to process the image recognition process of the 163 album Verification Code. It is only for technical purposes. Therefore, no source code download is provided in this Article !! I am not responsible for any liability arising from any use of the methods described here !! If you need to reprint this article, please indicate the original author and source !!

 

Generally, the verification code must be handled in three steps: "anti-interference", "word cutting", and "recognition.

 

I. Cutting words:

Each verification code in the image is segmented to identify the next verification code, in addition, the "word" order after "word cutting" is also related to the identified character order, such as the following verification code sample image:

 

You must cut out the "7", "4", "3", "7", and "7" text graphs.

 

For different verification code images, the method of "word cutting" is also different. For some verification code images with fixed positions, you can directly analyze the word coordinates from the image, then, click "cut. For some verification code images (such as those in the 163 album) that use "displacement" interference, fixed coordinates cannot be used to "cut words, in addition, for verification code images connected by some characters (such as Google's), "word cutting" is more headache than "anti-interference !! (-_ # I generally give up when I encounter such verification code images. Click it !)

 

It is still very easy to split the verification code image of 163 album, because there is no connection between the verification code characters, but the "displacement" interference is adopted, however, the "de-whitelist splitting" (hey, this method is my own name) is basically a omnipotent method.

 

Split by whitelist:

That is to say, first remove the blank header and tail rows/columns, then split the blank columns into multiple subgraphs, and then remove the blank rows/columns at the beginning and end of these subgraphs. After processing these steps, the split subgraph is the final "cut" Verification Code diagram.

 

1. whitelist removal: removes blank rows/columns at the beginning and end of the Verification Code image.

For example, the above verification code image (for ease of illustration, I open the sample image in the drawing board program and enlarge the image by 6 times and display the grid ):

 

Remove blank header and tail rows/columnsRemove the yellow area and leave only the middle area.

 

After "white" processing, the image becomes the following style:

 

2. Split: Split the image by the blank column. For example, if you split the Graph Based on the red line, all the verification code Diagrams (, and 7) are "cut,

However, you should note that after the above score, the word "distinct" still contains a blank header and tail line, so you should also "whitelist" the line, for example: (that is, remove those yellow areas)

 

After such processing, the "word chart" can be used for recognition :)

 

Note: For some images that are seriously damaged by interference, make sure that the width of the word chart to be split is the size of the source number chart.

For example, "5:

 

Ii. Verification Code Recognition:

After "anti-interference" and "word cutting" processing, recognition is a very easy task. Generally, the image structure "Similarity comparison method" is used for identification. In this way, some word graphs that disrupt the structure during "deinterference" (such as the last two "7" Words in the above image) it can also be identified, but because it is a "similar" comparison of the graphic structure, there is a possibility of recognition failure.

 

Similarity comparison method:

This method compares the word map after each split with all source numeric charts and obtains the similarity value of a graphic structure, then, the "source number Chart" with the highest similarity value is obtained, so that the characters corresponding to the "word chart" are recognized.

 

Image structure similarity:

If you think of a graph as a two-dimensional array (one-dimensional subscript corresponds to the X axis, two-dimensional subscript corresponds to the Y axis), the data in the array is the color value of each pixel point. The similarity value of the two-dimensional graph structure is equivalent to the similarity statistics of the data in two-dimensional arrays.

 

Assume that the data of two arrays is as follows:

Data in two-dimensional array A: (01 of the word "4)

Code
00000110
00001110
00011110
00110110
01100110
11000110
11111111
00000110
00000110
00000110

 

Data in two-dimensional array B: (Figure 01 after the word "4" is damaged by interference. Pay attention to the red part)

 

Code
00000110
00001110
00011110
00110110
01100110
11000110
11100111
00000010
00000110
00000110

 

Evaluate the similarity between A and B, compare the data in the "row" corresponding to AB, and find out the number of similarities, that is, there are 3 differences, so the similarity value is about 96%, therefore, we can think that B is.

NOTE: For the similarity value, we can consider AB "equal". We need to weigh this. After all, the recognition rate of the retrieved value is very high because it is too low.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.