How to use PHP to crack the site verification Code, PHP website verification code crack
The function of verification code is usually to prevent the use of malicious program registration, brute force or bulk posting. The so-called verification Code, is a series of randomly generated numbers or symbols, to generate a picture, the picture with some interference pixels (to prevent OCR), by the user to identify the identification of the verification code information, input form submitted to the site verification, verification success to use a function. Learning verification Code of the crack/recognition technology, not only can know the principle of verification code, and can let you know how to prevent the verification code is cracked.
The most common verification codes include the following:
1, four digits, a random number string, the most original verification code, the verification function is almost zero.
2, random Digital Image verification code. The characters on the picture are quite good, some may add some random interferon, some are random character color, the verification function is better than the previous one. People who do not have basic graphics and image knowledge can not be broken!
3. Random numbers of various picture formats + random Capital English letters + random interference pixels + random position.
4, the Chinese character is the current registration of the latest verification code, randomly generated, more difficult to fight, affecting the user experience, so, the general application of the relatively few.
For the sake of simplicity, the cracking instructions are mainly for the 2nd type, take a look at the common online image of this verification code:
- The first, easiest, picture backgrounds and numbers use the same color, the characters are normalized, and the character positions are uniform.
- The second, seemingly not easy, in fact, careful study will find its rules, background color and interferon no matter how changes, verify character regularity, color is the same, so it is very easy to exclude interferon, as long as the non-character pigment completely excluded.
- Third, seemingly more complex, processing the above mentioned background color and interferon has been changing, the verification of the color of the character is also changing, and the color of each character is different.
- Fourth, in addition to the third picture mentioned in the characteristics, but also in the text added two linear interference rate, seemingly difficult actually, it is easy to remove.
Verification code identification is generally divided into the following steps:
1, remove The matrix identification Verification code, after all, is not a professional OCR recognition, and, because each site verification code is different, so, the most common method is to establish the signature library of this verification code. To go to the matrix, we need to download a few more pictures, so that these pictures, including all the characters, we have the letters here only pictures, so as long as the collection of images including 0-9 can be.
2, Binary value Two is the verification of the picture on the number of each pixel with a number of 1, the other part with 0. In this way, each digital font can be calculated, and the fonts are recorded as key.
3, the calculation of the characteristics of the image to be identified, binary, to get the picture features.
4, the Control sample of the steps 3 kinds of picture signature and verification Code of the matrix comparison, to obtain the number of the verification picture.
Using this method, the identification of the verification code can basically achieve 100%.
Through the above steps, you may have said, and did not find how to remove interferon ah! In fact, the method of removing interferon is very simple, an important feature of interferon is that it can not affect the display effect of the verification code, so when making interferon, its RGB may be lower or higher than a certain value, such as the image I gave to the example, interferon RGB values are not more than 125, so, So we can easily get rid of interferon.
The simple verification code only has the number and the letter composition, the format unification, each occurrence position fixed. The following further research identification verification code, this time need to identify the target is: The verification code has characters and numbers, the verification code is rotated (may rotate around), the position is not fixed, there is a character and the adhesion between characters, and the verification code has a stronger interferon.
We consider the example to be explained.
First step: two value. The verification code part with 1, the background part with 0 to express, the recognition method is very simple, we print the verification code of the whole picture RGB, and then analyze its law can, through the RGB code, we can easily distinguish the above image of the R value is greater than, G and B value is less than 80, According to this rule we can easily put the above picture two value.
Take a look at the third verification code image above
Just looking, it feels complicated. The verification code of the picture each time the background color is not the same, and not monochrome, each code number of colors are different each time. Seemingly difficult to binary, in fact, we print out its RGB value is easy to find. Regardless of how the digital color changes, the RGB value of the number has a value of less than 125, so judge by the following $rgbarray [' Red '] < 125 | | $rgbarray [' Green ']<125| | $rgbarray [' Blue '] < 125 It's easy to tell where the numbers are and where the backgrounds are.
The reason we are able to find these laws is that in order to make interferon without affecting the display of the numbers, the RGB and digital RGB of interferon must be independent of each other and not interfere with each other. As long as we understand this law, we can easily achieve two value.
We found 120, 80, 125 and other thresholds, may differ from the actual RGB, so, sometimes after the binary, there will be some place 1, for the verification code on the fixed position display number, this interference does not make much sense. However, for images where the captcha location is uncertain, it is likely to cause interference when we cut the characters. Therefore, the denoising process should be done after the binary value.
Step two: noise removal. the principle of de-noising is very simple, is to remove the effective value of isolation, if the noise is relatively high, the requirements of high efficiency, it also has a lot of work to do. Fortunately here we do not require so advanced, we use the simplest method can be, if a point is 1 to determine the point of the upper and lower left and right down to the bottom right of the lower left 8 of the number is 1, if not 1, it is considered a dry point, directly set to 1.
As shown, we use this method to easily find the red box part of the 1 is the dry point, directly set to 1. We used a technique when judging, and sometimes the noise might be two consecutive 1, so we calculated the sum of the values in the 8 directions of the point, and finally we judged whether they were less than the specific threshold value.
Step three: cut the characters. There are many ways to cut characters, the simplest of which is to cut into characters in the vertical direction and then remove more than 0000 horizontally, as
The first step is to cut the red line part, the second step is to cut the blue lines, so that you can get the independent characters. But as in the case below
The above method will cut the DW character into one character, which is the wrong cut, so here we are involved in the cutting of the adhesion character.
Fourth step: adhesion character cutting. when making the verification code, the regular character adhesion is easy to split open, if the character itself has a scaling, deformation is difficult to deal with, after analysis, we can find that the above character adhesion is a very simple way, just the adhesion of regular characters, so to deal with this situation, we also use a very simple processing way. When the split operation, we can not immediately determine the segment is a character, to be verified, the key factor of validation is whether the width of the cut down is greater than the threshold value, the threshold of the choice is that a character no matter how the rotational deformation is not greater than this threshold, so, If we cut the block larger than this threshold, we can think of this as a sticky character, if the sum of more than two threshold values is considered to be three characters sticking, and so on. Once you know the rules, it's easy to cut the sticky characters. If we find a block of sticky characters, we can divide the block directly into two or more new blocks. Of course, in order to better restore characters, I generally use the split +1, 1 of the word converts sequential blocks part of the appropriate supplement.
Fifth step: match the characters. There are many ways to set up character codes for rotating characters, and this is not an in-depth study. The simplest way I use this is to build a matching library for all of the characters, so the code I provide adds the study operation, which is designed to manually identify the verification code of the image, and then write the signature library through the study method. The more picture data is written, the higher the exact line of validation is recognized.
After the above steps, we can basically identify most of the current Internet authentication code, here we are all using the simplest method, do not use any OCR knowledge.
Also some suggestions for making verification code:
For the identification code of the program, the most rare part is to verify the character of the cutting and the establishment of the signature, and many domestic programmers only do the verification code, always like to add a lot of interferon in the verification code, interference line, the effect does not say, but also do not reach a good effect; so, to make their verification code difficult to identify, Just do the following two points, that's enough.
1, the character adhesion, the best all characters have the adhesion part;
2, do not use the specification character, the verification code of the various parts of the use of different scales or rotation.
As long as these two points, or the two-point deformation, the identification process is difficult to identify.
The above is the whole content of this article: the use of PHP to crack the site verification code, I hope that everyone's learning has helped.
http://www.bkjia.com/PHPjc/1057092.html www.bkjia.com true http://www.bkjia.com/PHPjc/1057092.html techarticle How to use PHP to crack the site verification code, PHP Web site Verification Code cracking verification code is generally to prevent the use of malicious program registration, brute force or bulk posting set. The so-called test ...