Php verification code recognition (elementary) _ PHP Tutorial

Source: Internet
Author: User
Php implements verification code recognition (preliminary ). I recently studied some knowledge about breaking through verification codes and recorded them. On the one hand, it is a summary of the learning knowledge over the past few days to help you understand it. on the other hand, we hope to record the knowledge of the technical personnel who have studied the verification code. On the one hand, it is a summary of the learning knowledge over the past few days to help you understand, on the other hand, we hope to help technical students in this area, and also hope to attract the attention of website managers, take the verification code into consideration. As I have just been familiar with this knowledge, it is easy to understand and make mistakes. You are welcome to make a brick.
Verification code: effectively prevents a hacker from continuously trying to log on to a specific registered user using brute-force cracking of a specific program. In fact, the modern verification code generally prevents the machine from batch registration, and prevents the machine from batch posting and reply. Currently, many websites use verification codes to prevent users from automatically registering, logging on, and bumping through robots.
The so-called verification code is to generate an image with a string of randomly generated numbers or symbols, and add some interference pixels to the image (preventing OCR). The user can identify the verification code information with the naked eye, enter a form to submit the website for verification. a function can be used only after the verification is successful.
Our most common verification code
One or four digits, a random string of one digit, and the original verification code. the verification function is almost zero.
2. random image verification code. The characters in the image are relatively regular, some may be added with some random interferon, and some may be random character colors. the verification effect is better than the previous one. People who do not have the knowledge of basic graphics and images cannot break through!
3. random numbers in various image formats + random uppercase English letters + random interference pixels + random positions.
4. Chinese characters are currently the latest verification codes registered. they are randomly generated, making them more difficult to create and affecting user experience. Therefore, there are usually few applications.
For the sake of simplicity, the main object we described here is 2nd types. let's take a look at several images of such verification codes that are common on the Internet.

(I don't know how to do this. CSDN can't upload images anymore. I have put these four images in the download package. you can download them and compare them)
These four styles basically represent the verification code type mentioned in 2. initially, it seems that the first image is the easiest to crack, the second, the third, and the fourth are the most difficult.
What is the actual situation? In fact, these three types of images are equally difficult to crack.
The first image is the easiest. The image background and numbers are in the same color, regular characters, and uniform character positions.
The second picture seems not easy. In fact, we will carefully study the rules, the background color and the interferon, no matter how they change, verify that the characters are regular and the colors are the same, so it is very easy to exclude interferon, exclude all non-character pigments.
The third image looks more complex. the color of the verification character is also changing, and the color of each character is also different, except that the background color and interferon are constantly changing. It seems that this verification code cannot be broken through. in this article, we will take this type of verification code as an example to describe the fourth picture.
In addition to the features mentioned in the third image, the fourth image also adds two linear interference rates to the text. In fact, it seems difficult and easy to remove.
Verification code recognition is generally divided into the following steps:
1. extract the model
2. binarization
3. computing features
4. Comparison Sample
1: extract the model
After all, the verification code is not a professional OCR recognition. because the verification codes of different websites are different, the most common method is to create a signature library for this verification code. When downloading the dashboard, we need to download several more images to make these images contain all the characters. The letters here are only images. Therefore, we only need to collect images including 0-9.
2: binarization
Binarization means that each pixel in the verification number on the image is represented by 1 in a number, and the other part is represented by 0. In this way, you can calculate each digital model, record these fonts, and use them as keys.
3: computing features
Binarization the image to be recognized to obtain the image features.
4: control sample
Compare the image signature and verification code pattern in step 3 to obtain the number on the verification image.
Currently, the verification code can be identified as 100%.
After completing the above steps, you may have said that you have not discovered how to retrieve interferon! In fact, the method to retrieve interferon is very simple. an important feature of interferon is that it does not affect the display effect of the verification code. Therefore, the RGB value of interferon may be lower than or higher than a specific value, for example, in the image I gave, the RGB values of interferon will not exceed 125, so we can easily remove interferon.
Php code


[Php]
Define ('Word _ width', 9 );
Define ('Word _ HT ', 13 );
Define ('offset _ X', 7 );
Define ('offset _ Y', 3 );
Define ('Word _ spacing', 4 );
Class valite
{
Public function setImage ($ Image)
{
$ This-> ImagePath = $ Image;
}
Public function getData ()
{
Return $ data;
}
Public function getResult ()
{
Return $ DataArray;
}
Public function getHec ()
{
$ Res = imagecreatefromjpeg ($ this-> ImagePath );
$ Size = getimagesize ($ this-> ImagePath );
$ Data = array ();
For ($ I = 0; $ I <$ size [1]; ++ $ I)
{
For ($ j = 0; $ j <$ size [0]; ++ $ j)
{
$ Rgb = imagecolorat ($ res, $ j, $ I );
$ Rgbarray = imagecolorsforindex ($ res, $ rgb );
If ($ rgbarray ['red'] <125 | $ rgbarray ['green'] <125
| $ Rgbarray ['blue'] <125)
{
$ Data [$ I] [$ j] = 1;
} Else {
$ Data [$ I] [$ j] = 0;
}
}
}
$ This-> DataArray = $ data;
$ This-> ImageSize = $ size;
}
Public function run ()
{
$ Result = "";
// Search for 4 numbers
$ Data = array ("","","","");
For ($ I = 0; $ I <4; ++ $ I)
{
$ X = ($ I * (WORD_WIDTH + WORD_SPACING) + OFFSET_X;
$ Y = OFFSET_Y;
For ($ h = $ y; $ h <(OFFSET_Y + WORD_HIGHT); ++ $ h)
{
For ($ w = $ x; $ w <($ x + WORD_WIDTH); ++ $ w)
{
$ Data [$ I]. = $ this-> DataArray [$ h] [$ w];
}
}

}
// Match the keyword
Foreach ($ data as $ numKey => $ numString)
{
$ Max = 0.0;
$ Num = 0;
Foreach ($ this-> Keys as $ key => $ value)
{
$ Percent = 0.0;
Similar_text ($ value, $ numString, $ percent );
If (intval ($ percent)> $ max)
{
$ Max = $ percent;
$ Num = $ key;
If (intval ($ percent)> 95)
Break;
}
}
$ Result. = $ num;
}
$ This-> data = $ result;
// Find the best matching number
Return $ result;
}
Public function Draw ()
{
For ($ I = 0; $ I <$ this-> ImageSize [1]; ++ $ I)
{
For ($ j = 0; $ j <$ this-> ImageSize [0]; ++ $ j)
{
Echo $ this-> DataArray [$ I] [$ j];
}
Echo "/n ";
}
}
Public function _ construct ()
{
$ This-> Keys = array (
'0' => '123 ',
'1' => '123 ',
'2' => '123 ',
'3' => '123 ',
'4' => '123 ',
'5' =>
'123 ',
'6' => '123 ',
'7' => '123 ',
'8' => '123 ',
'9' => '123 ',
);
}
Protected $ ImagePath;
Protected $ DataArray;
Protected $ ImageSize;
Protected $ data;
Protected $ Keys;
Protected $ NumStringArray;
}
?>

I made an example where you can download the http://www.BkJia.com/uploadfile/2012/0316/20120316110154186.rar from here

After cracking the verification code above, we can use snoopy (lightweight than curl, so I like it) to simulate a browser and access the website.


From the ugg column

Bytes. On the one hand, it is a summary of the learning knowledge over the past few days to help you understand; on the other hand, we hope to study this technology...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.