How to use PHP to crack website verification codes and php website verification codes. How to use PHP to crack the website verification code. The php website verification code cracking function is generally set to prevent malicious registration, brute force cracking, or batch posting by programs. How to use PHP to crack website verification codes and php website verification codes
The verification code function is generally set to prevent malicious registration, brute force cracking, or batch posting. The so-called verification code is to generate an image with a string of randomly generated numbers or symbols, and add some interference pixels to the image (preventing OCR). The user can identify the verification code information with the naked eye, enter a form to submit the website for verification. a function can be used only after the verification is successful. Learning the verification code cracking/recognition technology can not only understand the principles of the verification code, but also let you know how to prevent the verification code from being cracked.
The most common verification codes are as follows:
1. four digits, a random string, and the original verification code. the verification function is almost zero.
2. random image verification code. The characters in the image are relatively regular, some may be added with some random interferon, and some may be random character colors. the verification effect is better than the previous one. People who do not have the knowledge of basic graphics and images cannot break through!
3. random numbers in various image formats + random uppercase English letters + random interference pixels + random positions.
4. Chinese characters are currently the latest verification codes registered. they are randomly generated, making them more difficult to create and affecting user experience. Therefore, there are usually few applications.
For the sake of simplicity, the attack description mainly targets 2nd types. let's take a look at the common online verification code pictures:
- First, the easiest way is to use the same color, regular characters, and uniform character positions for the image background and numbers.
- Second, it seems not easy. In fact, we will carefully study the rules, the background color and interferon, no matter how they change, verify that the characters are regular and the colors are the same, so it is very easy to exclude interferon, exclude all non-character pigments.
- Third, it seems more complex. the color of the verification character is also changing, and the color of each character is also different, except that the background color and interferon are constantly changing.
- Fourth, in addition to the features mentioned in the third image, two linear interference rates are added to the text, which seems difficult to remove.
Verification code recognition is generally divided into the following steps:
1. extract the modelAfter all, the verification code is not a professional OCR recognition. because the verification codes of different websites are different, the most common method is to create a signature library for this verification code. When downloading the dashboard, we need to download several more images to make these images contain all the characters. The letters here are only images. Therefore, we only need to collect images including 0-9.
2. binarizationBinarization means that each pixel in the verification number on the image is represented by 1 in a number, and the other part is represented by 0. In this way, you can calculate each digital model, record these fonts, and use them as keys.
3. computing featuresBinarization the image to be recognized to obtain the image features.
4. control samplesCompare the image signature and verification code pattern in step 3 to obtain the number on the verification image.
Currently, the verification code can be identified as 100%.
After completing the above steps, you may have said that you have not discovered how to retrieve interferon! In fact, the method to retrieve interferon is very simple. an important feature of interferon is that it does not affect the display effect of the verification code. Therefore, the RGB value of interferon may be lower than or higher than a specific value, for example, in the image I gave, the RGB values of interferon will not exceed 125, so we can easily remove interferon.
A simple verification code consists of only numbers and letters. the format is uniform and the position is fixed each time. Next, we will continue to study the verification code in depth. the purpose of this recognition is: the verification code consists of characters and numbers. the verification code is rotated (either left or right) and its position is not fixed, there is a adhesion between characters, and the verification code has stronger interferon.
We will explain it as an example.
Step 1: binarization.The verification code part is represented by 1, and the background part is represented by 0. the recognition method is very simple. we can print the RGB color of the entire verification code image, and then analyze its pattern. through the RGB code, we can easily tell that the R value of the above image is greater than 120, and the G and B values are less than 80. Therefore, we can easily bind the above image according to this rule.
Let's take a look at the third verification code picture above.
It seems complicated. The background color of each verification code image is different, and the color of each verification code number is different. It seems difficult to binarization. In fact, we can easily print out the RGB values. Regardless of how the color of the verified digit changes, the RGB value of the digit is always less than 125, therefore, we can easily identify $ rgbarray ['red'] <125 | $ rgbarray ['green'] <125 | $ rgbarray ['blue'] <125 where is the number, where is the background.
The reason we can find these rules is that, in order to make the interferon of the verification code do not affect the display effect of digits, the RGB and RGB values of the interferon must be independent of each other and do not interfere with each other. As long as we understand this rule, we can easily achieve binarization.
The 120, 80,125, and other thresholds we found may differ from the actual RGB values. Therefore, sometimes, after binarization, 1 May appear in some places, and a number is displayed at a fixed position on the verification code, this interference is of little significance. However, for images with uncertain verification code positions, when we cut characters, it is likely to cause interference. Therefore, noise reduction is required after binarization.
Step 2: remove noise.The principle of noise removal is to remove isolated valid values. if the noise is high and the required efficiency is high, there is a lot of work to be done here. Fortunately, we do not need to be so advanced. we can use the simplest method, if the value of a vertex is 1, it determines whether the number in the top, bottom, top, top, bottom, and right of the vertex is 1. if the value is not 1, it is considered a dry point, set it to 1 directly.
As shown in, we use this method to easily find that 1 in the red box is dry, and set it to 1 directly. We used a technique to judge. sometimes the noise may be two consecutive ones, so we calculate the sum of the values in the eight directions of the point, finally, we determine whether their sum is smaller than the specific threshold.
Step 3: Cut characters.There are many ways to cut characters. here we use the simplest one. first we cut the vertical direction into characters, and then remove more than 0000 in the horizontal direction, as shown in
Step 1 Cut the red line and step 2 cut the blue line to get independent characters. But in the following case:
In the above method, the dw character is cut into one character. this is a wrong cut, so here we are involved in the cutting of the adhesion character.
Step 4: stick character cutting.When creating a verification code, the sticking of the rule characters is easy to split. if the characters themselves are scaled, deformation is very difficult to handle. after analysis, we can find that, the above character adhesion is a very simple method, but only the rule character adhesion, so we also use a very simple processing method to deal with this situation. After the split operation is completed, we cannot immediately determine that the split part is a character. the key factor for verification is whether the width of the cut characters exceeds the threshold, the trade-off criterion for this threshold value is that no matter how a character is rotated or deformed, it will not be greater than this threshold value. Therefore, if the cut block is greater than this threshold value, it can be considered as a sticking character; if it is greater than the sum of the two thresholds, it is considered to be three-character adhesion, and so on. After knowing this rule, it is easy to cut the adhesion characters. If we find that it is a sticking character block, we can directly divide this block into two or more new blocks. Of course, in order to better restore characters, I usually use the "equally divided" + 1,-1 to supplement the part of the character block.
Step 5: match characters.There are many ways to create a pattern for rotating characters, so we will not study it in depth here. The simplest method I use here is to create a matching library for all characters, so the study operation is added in the code I provide. The purpose is, first, someone manually identifies the image verification code, and then writes it to the signature library through the study method. In this way, the more image data is written, the higher the validation and recognition accuracy.
After the above steps, we can basically identify most of the verification codes on the internet. here we use the simplest method without any OCR knowledge.
Some suggestions for creating verification codes:
For verification code recognition programs, the most rare part is the cutting of verification characters and the establishment of signatures. many programmers in China always like to add a lot of interferon and interference lines to the verification code when they only do the verification code, not to mention the effect, but not to achieve good results; therefore, to make your own verification code difficult to identify, only the following two points are enough.
1. character Adhesion. it is recommended that all characters have the adhesion part;
2. do not use specification characters. each part of the verification code is scaled or rotated in different proportions.
As long as these two points are achieved, or the deformation of these two points, it is difficult for the recognition program to recognize them.
The above is all the content of this article: using PHP to crack the website verification code, I hope it will help you learn.
The token verification code is generally set to prevent malicious registration, brute force cracking, or batch posting. The so-called verification...