C # verification code recognition

Source: Internet
Author: User
Tags image line
1) Basic knowledge and reason of the Verification Code
Network SecurityThe main purpose of the verification code in technology is to force human-computer interaction to defend against automated machine attacks. It is used to prevent machines from simulating HTTP behavior, directly capturing text for navigation, or directly submitting text for login attempts. Today, when the bandwidth is large, the bandwidth of the online password cannot be used as a bottleneck, Verification Code RecognitionIn the actual test of 2 m ADSL connection, 20 threads can complete about 30 connections per second. If it is a 6-digit password, it takes only a few hours to crack the dictionary, which seriously threatens the security of the network account. Therefore, it is imperative to popularize the network login and registration verification codes.
Today, most verification code designers in China do not have to learn the essentials, either they do not understand the basic concepts of image processing, machine vision, pattern recognition, or artificial intelligence, or the verification code they have designed is hard to recognize, this reduces the user experience. For example, the Bank's WAP mobile phone banking Verification Code has only four digits, and the weak verification code is essentially a false one. Using a poor computer, you can quickly crack an account with six digits. Of course, there are also well-designed ones, such as Yahoo, Google, and Baidu.
2) prospects for verification Codes
Future Network SecurityVerification code, which may use progressive color layers, gray color differences at the same level, and more methods such as Curve inversion, number of random characters, and character adhesion to prevent Machine recognition. However, encryption and cracking are always a pair of twins, it is impossible to disappear one party. We just hope that through this article, we will give some suggestions to the personnel who compile the verification code algorithm to make our network safer and the operations will not be cumbersome.

Algorithm Analysis
In terms of verification code processing, we generally involve the following: artificial intelligence, pattern recognition, machine vision, and image processing.
1) Main process: if we want to identify a verification code from an image, or we want to detect and recognize a character from an image, the steps are summarized as follows:
Image Acquisition: after obtaining a verification code, you can directly capture HTML through HTTP, analyze the image URL, download and save it.
Pre-processing: the detection is the correct image format, which is converted to the appropriate format, compressed, cut out the ROI, remove noise, grayscale, and convert the color space.
Detection: locate the main area where the text is located.
Pre-processing: Text cutting, scaling, and distortion correction.
Training: selects and trains a suitable number of training sets through various pattern recognition and machine learning algorithms. The more samples you train, the better. Through learning, the problem of poor generalization ability may occur here.
Recognition: the input image to be recognized is converted to the input format required by the classifier. The output class and confidence level are used to determine which letter is probably used. Recognition is classification in essence.
2) Key Concepts
Image processing: generally refers to some mathematical processing of digital images, such as projection, passivation, sharpening, refinement, edge detection, binarization, compression, and various data transformations.
Binarization: Generally, images are color-coded and may be of many levels based on the degree of fidelity. In order to reduce the computing complexity and facilitate subsequent processing, it would be best if the image can be processed into black and white colors without losing key information.
Details: Find the skeleton of the image. The image line may be wide. By refining the image, the width is reduced to 1, and in some places it may be greater than 1. Different refinement algorithms may have different differences, such as whether they are closer to the middle of a line, or whether they are connected.
Edge Detection: Mainly understanding the concept of edge. The edge is actually the place where the pixel attribute changes dramatically in the image. It can be determined by a fixed threshold value or adaptive. The threshold can be full or partial. You can't say which one is better, but most of the time, the adaptive local threshold may be better. The analyzed images may be colors or grayscale images.
Machine Vision: The use of computers to achieve human vision, such as object detection, positioning, and recognition. According to the differences in the level of understanding of the image, it is divided into high-level and low level understanding.
Pattern Recognition: a representation of a thing or phenomenon (numerical value or text, which we mainly want to talk about here ), describes, classifies, understands, and explains these things, phenomena, and some abstraction through some processing and analysis.
Artificial Intelligence: this concept is relatively wide. All of the above are in the big direction of artificial intelligence. To put it simply, it should not be overly academic. It is to simulate the "intelligent" of human beings to help biological people solve problems, especially in computers.

Verification code identification principle and code demonstration
It was dangerous to use a bank website to decrypt the verification code, but we found that ICBC's algorithm has been changed. So let's take the icbc wap bank as an example, at the same time, we also hope that ICBC's encryption will be better and better.
In fact, ICBC's WAP verification code is very simple. It is just a picture of the original characters without interference. For this verification code, we will use the dot matrix library verification method, starting with the entire program writing and operation sequence.
First, you need to know the dot matrix of the words we need to obtain. ICBC's WAP bank verification code is only 0 ~ 9, 10 digits, then we first download the verification code image to the local machine, here we must download all the characters of the pattern to the local machine, in order to establish the basic dot matrix library.
After obtaining these image files, we will use a program to obtain the image lattice. Load an image file from a local disk. This file has been downloaded. First, let the program ~ The 9 images are all "recognized", so that our program "remembers" their lattice, as shown in Example 1. The verification code contained in the image is 0123 from left to right. The image is converted to grayscale point by point, that is, the color image is dimmed and de-colored first, and then changed to a black-and-white image for further operations.

Figure 1

{
For (INT I = 0; I <bmpobj. height; I ++) // traverse the height
{
For (Int J = 0; j <bmpobj. width; j ++)
// Traverses the width, and the double-layer for loops the pixels of the entire image.
{
Int tmpvalue = getgraynumcolor (bmpobj. getpixel (J, I ));
Bmpobj. setpixel (J, I, color. fromargb (tmpvalue, tmpvalue, tmpvalue ));
}
}
}

After grayscale, the RGB values of pixels are the same, and the brightness ranges from 0 ~ 255 (hxff ). However, for the Identification Program, the gray value does not distinguish the background color from the foreground color. Especially for progressive backgrounds, we need to further process the image, that is, bind the grayscale image to a value, similar algorithms include watershed algorithms. Because the verification code in this article is relatively simple, you can directly use the 2fa conversion to find the valid area and convert it to a monochrome black/white image.
{
Int dggrayvalue = 128 // grayscale background Boundary Value
Int charscount = 4 // Number of valid characters, known
Int posx1 = bmpobj. width; int posy1 = bmpobj. height;
Int posx2 = 0; int posy2 = 0;
For (INT I = 0; I <bmpobj. height; I ++) // find the valid zone
{
For (Int J = 0; j <bmpobj. width; j ++)
{
Int pixelvalue = bmpobj. getpixel (J, I). R;
// Obtain the red value R, because after the image is converted to a black or white image, the red, yellow, and Blue values are the same, so the color values here are the same.
If (pixelvalue <dggrayvalue) // based on the gray value
{
If (posx1> J) posx1 = J;
If (posy1> I) posy1 = I;
If (posx2 <j) posx2 = J;
If (posy2 <I) posy2 = I;
}
}
}
// Ensure Division
Int span = charscount-(posx2-posx1 + 1) % charscount;
// Number of differences that can be divisible
If (span <charscount)
{
Int leftspan = SPAN/2;
// Empty column assigned to the left. If span is a singular number, the right side is 1 larger than the left side.
If (posx1> leftspan)
Posx1 = posx1-leftspan;
If (posx2 + span-leftspan <bmpobj. width)
Posx2 = posx2 + span-leftspan;
}
// Copy the New Graph
Rectangle clonerect = new rectangle (posx1, posy1, posx2-posx1 + 1, posy2-posy1 + 1 );
Bmpobj = bmpobj. Clone (clonerect, bmpobj. pixelformat );
}
Bitmap [] pics = getsplitpics (4, 1); // split, as shown in Figure 2 in PICs [0]

Figure 2

In the average part of the image, set the horizontal part number to rownum and the vertical part to colnum to return the split image array. The program is written as follows:

Public bitmap [] getsplitpics (INT rownum, int colnum)
{
If (rownum = 0 | colnum = 0)
Return NULL;
Int singw = bmpobj. width/rownum;
Int Singh = bmpobj. Height/colnum;
Bitmap [] picarray = new bitmap [rownum * colnum];
Rectangle clonerect;
For (INT I = 0; I <colnum; I ++) // find a valid zone
{
For (Int J = 0; j <rownum; j ++)
{
Clonerect = new rectangle (J * singw, I * Singh, singw, Singh );
Picarray [I * rownum + J] = bmpobj. Clone (clonerect, bmpobj. pixelformat); // copy a small image
}
}
Return picarray;
}

At this time, the image is split, the pics length should be 4, and each pics is a verification code bitmap. After error processing, Edge trimming, and useless background blank removal, the corrected bitmap is 0.
After a valid image is obtained, the image is input from the outside, and the grayscale background demarcation value is set to "dggrayvalue". The valid characters are charscount. The program is written as follows:

Public bitmap getpicvalidbyvalue (Bitmap singlepic, int dggrayvalue)
{
Int posx1 = singlepic. width; int posy1 = singlepic. height;
Int posx2 = 0; int posy2 = 0;
For (INT I = 0; I <singlepic. height; I ++) // find a valid zone
{
For (Int J = 0; j <singlepic. width; j ++)
{
Int pixelvalue = singlepic. getpixel (J, I). R;
If (pixelvalue <dggrayvalue) // based on the gray value
{
If (posx1> J) posx1 = J;
If (posy1> I) posy1 = I;
If (posx2 <j) posx2 = J;
If (posy2 <I) posy2 = I;
};
};
};
// Copy the New Graph
Rectangle clonerect = new rectangle (posx1, posy1, posx2-posx1 + 1, posy2-posy1 + 1 );
Return singlepic. Clone (clonerect, singlepic. pixelformat );
}

So far, the pics Image Group is a valid bitmap. Next, we convert the image in PICs to a string representing the dot matrix, and return the dot matrix description string of the grayscale image. 1 indicates the gray dot, and 0 indicates the background. Set the grayscale image to singlepic, and the background gray margin to dggrayvalue.

String code = getsinglebmp code (PICS [0], 128 );
Public String getsinglebmp code (Bitmap singlepic, int dggrayvalue)
{
Color piexl;
Stringbuilder sbcode = new stringbuilder ();
For (INT posy = 0; posy <singlepic. height; posy ++)
For (INT posx = 0; posx <singlepic. width; posx ++)
{
Piexl = singlepic. getpixel (posx, Posy );
If (piexl. r <dggrayvalue) // color. Black)
Sbcode. append ('1 ');
Else
Sbcode. append ('0 ');
}
Return sbcode. tostring ();
}

At this point, the string in the Code represents the value of the character 0 in the WAP bank of ICBC like the verification code; and so on, we can get a complete, representing the image 0 ~ The order of the orders table is 0 ~ 9, ~ Z, ~ Z.
Now, the image dot matrix array has been obtained. Let's take a look at how to identify an image! The following dot matrix table is known:
String [] codearray = new string [] {
"0011100011011011000111100011110101111010111100011110001101101100011100", "001100011100111100001100001100001100001100001100001100111111", "0111110110001100000110000110000110000110000110000110000011000111111111", "0111110110001100000110000011001111000000110000011000001111000110111110", "0000110000111000111100110110110011011111110000110000011000001100001111", "00011111000110000001100000011000000111111000000010000000100000001001100000001111", "001110011000110000110000111111110001110001110001110001011111", "00011111000110000000000000000000000000011000001110000110100001101000011000000110", "0111110110001111000111100011011111011000111100011110001111000110111110 ", "0111110110001111000111100011011111100000110000011000001100001100111100 "};

Start to process the comparison operation:

Stringbuilder sbresult = new stringbuilder ();
{
For (INT I = 0; I <4; I ++)
{
String code = getsinglebmp code (PICS [I], 128 );
// Get the code string
System. Collections. Generic. dictionary <char, double> equalspercentlist = new dictionary <char, double> (); // create a list of differences
For (INT arrayindex = 0; arrayindex <codearray. length; arrayindex ++)
{
If (arrayindex <10) // 0 ~ 9
{
Equalspercentlist. Add (char) (48 + arrayindex), 100 );
// Numeric conversion, C syntax
}
}
For (INT arrayindex = 0; arrayindex <codearray. length; arrayindex ++) // compares the character sequences in the dot matrix table.
{
If (arrayindex <10) // 0 ~ 9
{
Equalspercentlist [(char) (48 + arrayindex)] = equalspercent (Code, codearray [arrayindex]);
// The difference degree of the dot matrix of each image. The difference calculation here is the difference degree, that is, the difference percentage.
}

Obtain matching

{
Double Perc = 20; // The percentage difference must be less than 20%; otherwise, it must be incorrect.
String selectkey = "";
Foreach (char key in your spercentlist. Keys)
// Obtain the most matched item in the matching degree list
{
If (distinct spercentlist [Key] <PERC)
{
Perc = repeated spercentlist [Key];
Selectkey = key. tostring ();
}
}
Sbresult. append (selectkey );
}

So far, the four digits in the sbresult are the four digits in the image.

Conclusion
Verification code recognition is certainly not so simple, but there are still many websites using this unaltered verification code, so our network security is still a long way to go.
The above verification code recognition is the most basic algorithm, but many extension algorithms can be expanded based on the above ideas. For example, if some verification codes are output in rotation, the above program can rotate 360 degrees during verification, and the rotated image can then obtain the sequence and compare it with the image sequence, to get the most consistent. Some verification codes add borders. In this case, we can remove the borders before cutting and matching.
Through the above algorithm, we can draw that we should pay attention to the following factors when designing the verification code in the future:
1) in terms of noise and other types of use, we try our best to make it difficult to distinguish characters from prospects and backgrounds for obfuscation, and try our best to make the noise look the same as letters.
2) the design of a particularly good verification code should try its best to give full play to what humans are good at and what AI algorithms are not good. For example, the separation and handwriting of adhesive characters (it is also possible to make special deformation through the print), rather than simply adding some seemingly complicated noise or other fancy things.
3) from a professional Machine Vision perspective,Network SecurityThe design of the Verification Code must enable the attacker to identify the code multiple times between the low-level vision and high-level vision. This can greatly reduce the difficulty of the Code and the accuracy of the Code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.