How to use PHP to crack the site verification code _php tips

Source: Internet
Author: User

The function of the verification code is generally to prevent the use of malicious registration, brute force cracking or bulk posting set. The so-called verification code, is a series of randomly generated numbers or symbols, generate a picture, the picture plus some interference pixels (prevent OCR), by the user naked eye identify the authentication code information, input form to submit Web site verification, verify the success of the use of a function. Learn to decipher the code of the crack/recognition technology, not only to know the principle of verification code, but also let you know how to prevent the verification code is cracked.

The most common types of verification code are as follows:

1, four digits, a random number of strings, the most original verification code, the validation effect is almost zero.
2, random digital picture verification code. The characters on the picture are more modest, some may add some random interferon, and some are random character color, the verification effect is better than the previous one. The person who does not have basic graphic image to learn knowledge, cannot break!
3, a variety of picture format random number + random capital letters + random interference pixels + random position.
4, the Chinese character is the newest authentication code, the random generation, the fight is more difficult, affects the user experience, therefore, the general application is relatively few.
For simplicity, the crack description is mainly for the 2nd type, first of all, to see the image of this verification code common on the Internet:

    • The first, easiest, picture background and numbers all use the same color, the character is regular, character position unification.
    • The second, seemingly not easy, in fact, careful study will find its rules, background color and interferon no matter how the changes, the verification character is regular, the same color, so the elimination of interferon is very easy, as long as the non-character pigments are all excluded.
    • The third, seemingly more complex, handles the above mentioned background color and interferon has been changing, verify the color of the characters are also changing, and the color of each character is also different.
    • Fourth, in addition to the third picture mentioned features, but also in the text added two linear interference rate, seemingly difficult in fact, it is easy to remove.

Verification code recognition is generally divided into the following steps:

1, remove the font recognition code, after all, is not a professional OCR recognition, and, because each site's verification code is different, so the most common method is to establish this code of the signature library. To go to the font, we need to download a few more pictures, so that these pictures, including all the characters, we have the letters here only pictures, so, as long as the collection of 0-9 of the picture can be.
2, binary two value is to the image of the verification number on each pixel in a number of 1, the other part of 0. So you can calculate each number of the font, record the font, as a key can be.
3, the calculation of the characteristics of the image to be recognized for the binary, to obtain picture characteristics.
4, the Control sample of Step 3 of the picture signature and verification Code of the type of the comparison, to verify the number of pictures.
Using the current method, the identification of the verification code can basically do 100%.

Through the steps above, you may have said, and did not find out how to remove interferon ah! In fact, the method of extracting interferon is very simple, an important feature of interferon is that it can not affect the display of the verification code, so the production of interferon may be less than or higher than a certain value, such as I gave the picture in the example, the RGB value of interferon is not more than 125, so, So we can easily get rid of interferon.

Simple verification codes consist of only numbers and letters, a uniform format and a fixed position each time it appears. Below continue to study the identification of authentication code, this time need to identify the target is: The verification code has characters and numbers, the verification code exists rotation (may rotate around), position is not fixed, there are characters and the adhesion between characters, and the verification code has stronger interferon.

Our illustration below is an example.

The first step: two value of the. the part of the verification code is expressed in 1, the background part uses 0 to express, the recognition method is very simple, we print out the verification code entire picture's RGB, then analyzes its law then, through the RGB code, we can easily distinguish above this picture's R value to be bigger than, the G and B's value is less than 80, the Based on this rule we can easily value the image above two.

Let's take a look at the third type of verification code picture above.

It just looks like it's complicated. Verification code of the picture each time the background color is not the same, and is not a monochrome, the number of different authentication codes are also different colors each time. Seemingly difficult to binary, in fact, we print out its RGB value is easy to find. Regardless of how the digital color changes, the RGB value of the number has a value of less than 125, so the following judgment $rgbarray [' Red '] < 125 | | $rgbarray [' Green ']<125| | $rgbarray [' Blue '] < 125 It's easy to tell where the numbers are and where the background is.

The reason why we can find these laws is that in the production of the beta-interferon, in order to make interferon do not affect the display of numbers, we must use the RGB and digital RGB of interferon independent of each other, without interference. As long as we understand this rule, we can easily achieve the two value.

We found the 120, 80, 125 and other thresholds, may be the actual RGB discrepancy, so, sometimes after the binary, there will be some place 1, for the verification code on the fixed position display number, this interference does not have much significance. But for a picture where the location of the verification code is uncertain, it is likely to cause interference when we cut the character. Therefore, in the binary after the elimination of noise processing.

The second step: denoising processing. the principle of denoising is very simple, is to remove the effective value of isolation, if the noise is relatively high, the requirements of the efficiency is also relatively high, there are a lot of work to do. It's a good thing we don't ask for such a sophisticated we use the simplest method can be, if a point of 1 to determine the point of the top left and right in the lower left 8 directions on the right and the number is 1, if not 1, it is considered a dry point, directly set to 1 can be.

As shown in the figure above, we can easily find that the red box part of the 1 is dry point, directly set to 1. We used a trick in judging that sometimes the noise might be two consecutive 1, so we calculated the sum of the values in the 8 directions of the point, and finally we judged whether their sum was less than the specific threshold.

Step three: Cut the character. There are many ways to cut characters, the simplest one is to cut vertically into characters, then remove more than 0000 in the horizontal direction, as shown below

The first step cuts the red line part, the second step cuts the blue wire section, thus can obtain the independent character. But like the following situation

The above method will be the DW character cut into a character, which is the wrong cut, so here we involve the cutting of adhesion characters.

Fourth step: adhesion character cutting. when making the verification code, rule character adhesion is easy to split open, if the character itself has scaling, deformation is difficult to deal with, after analysis, we can find that the above character adhesion belongs to a very simple way, but the rules of character adhesion, so deal with this situation, we also use a very simple way of handling. When the split operation is complete, we can not immediately determine the part of the split is a character, to verify that the key factor of verification is that the width of the cut character is greater than the threshold, the threshold of the selection criteria is that a character no matter how the rotation deformation will not be greater than this threshold, so, If we cut the block larger than this threshold, we can think that this is an adhesion character, if the sum of more than two thresholds is considered to be three characters of adhesion, and so on. Knowing this rule, cutting adhesion characters is also very simple. If we find that it is an adherent character block, it is possible to divide the block directly into two or more new blocks. Of course, in order to better restore the characters, I generally use the split +1,-1 for the character at descriptor part of the appropriate supplement.

Fifth step: Match character. There are a number of ways to build a character code for a rotating character, and there is no further research here. The simplest way I use this is to build a matching library for all of the characters, so I've added a study operation to the code that I provide, and the goal is to have someone manually identify the code for the picture and then write the signature library through the study method. The more pictures you write this way, the higher the exact line of validation recognition will be.

Through the above steps, we can basically identify the majority of the current Internet authentication code, here we are the simplest way to use, do not use any OCR knowledge.

In addition, some suggestions for the production of verification Code:

The most rare part of the procedure for identifying a validation code is to verify the character's cutting and signature creation, and many domestic programmers only do verification code, always like in the verification code plus a lot of interferon, interference line, affect the effect does not say, also does not achieve very good results; So, to make your own verification code difficult to identify, It's enough to do the bottom two.

1, character adhesion, the best of all characters have adhesion parts;
2, do not use the specification character, the verification code of each part of the use of different proportions of the scaling or rotation.
As long as the two points, or the two points of deformation, the identification program is difficult to identify.

The above is the entire content of this article: the use of PHP to the site verification code to crack, I hope to help you learn.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.