Article Walker link: http://www.cnblogs.com/rgbw/archive/2012/12/26/2834567.html
Verification Code of yesterday, today and tomorrow
Why use a verification code
Without a verification code, an attacker would use unwanted programs to automatically register a large number of WEB service accounts, which could then be used by attackers to create trouble for other users, such as sending spam or delaying the service by repeatedly logging in to multiple accounts at the same time.
However, in most cases, the autoenrollment program is not very good at identifying the characters in the picture. Therefore, in order to prevent attackers from writing programs to automatically register or repeatedly login brute force password, authentication code technology came into being.
At present, many websites in order to prevent users to use the robot automatic registration, login, irrigation, have adopted the Verification Code technology. The so-called verification Code, is a series of randomly generated numbers or symbols, to generate a picture, the picture with some interference pixels (to prevent OCR), by the user to identify the identification of the verification code information, input form submitted to the site verification, verification success to use a function.
What is a verification code
The English representation of the "Captcha" is CAPTCHA (Completely automated public Turing test to tell Computers and humans Apart), which translates to "fully automatic computer-and human-Turing testing", As the name implies, it is used to differentiate between computers and humans. In the CAPTCHA test, a computer that acts as a server automatically generates an issue that is answered by the user. This problem can be generated and judged by a computer, but only humans can answer it. Since the computer is unable to answer the CAPTCHA question, the user who answers the question can be considered human. CAPTCHA is a computer to test the human, rather than the standard Turing test by humans to the computer, so people sometimes called CAPTCHA is a reverse Turing test.
Today, as many as hundreds of millions of verification codes are recognized by humans, so the demand for captcha is very large, captcha need to be able to automatically generate and evaluate the correctness. In addition, the human must be able to quickly identify and enter the verification code, otherwise it is easy to annoy users so that users lost. For CAPTCHA, the problem of artificial intelligence can be introduced, so that the existing technology cannot be successfully cracked in the short term. If a captcha is not cracked, then there is a way to differentiate between humans and computers. If Captcha is cracked, then a problem with artificial intelligence will be solved.
Type of verification code
Text Verification Code
The text verification code is convenient for the computer to generate a large number of automatically, is the most widely used technology at present. The text verification code mainly relies on the image distortion and adds the noise.
The main difficulty of text verification code cracking is the segmentation and recognition of characters. The character segmentation is the key to crack the text verification code. The main steps are: The first step, the split character, the second step, the individual character recognition, in which the individual character recognition can be easily identified under the existing machine learning algorithm.
Therefore, the key to prevent the attack on text verification code is to increase the difficulty of character segmentation. Companies like Google, such as the verification code are stuck together, the division is difficult.
Image Verification Code
Image verification code based on image classification, target recognition, scene understanding and other issues, generally more difficult to crack than text verification code, but the existing image verification code needs a large image database, and can not be produced on a large scale, and worse, once the database is published, the algorithm is not.
Sound Verification Code
The sound verification code plays randomly selected numbers of one or more people to broadcast the digital letters at random intervals, and then adds background noise. The sound verification code is easily attacked by machine learning algorithms, and is less user-friendly than the visual verification code. For the sound of letters, it is possible that minority groups in rural areas will not be able to pass the test because they are unfamiliar with the pronunciation of letters.
Use of verification codes
The server side randomly generates a CAPTCHA string, saves it in memory, writes a picture, sends it to the browser-side display, the browser-side enters the character on the captcha image, and then submits the server-side, the character that is submitted and the server-side save the character comparison is consistent, and the same continues, otherwise returns the hint. The attacker writes the robot program, it is difficult to identify the verification code characters, successfully complete the automatic registration, login, and the user can be identified, so this will achieve the role of blocking attacks. And the character recognition of the picture is to see the interference intensity on the picture. As a practical result, the CAPTCHA only increases the difficulty of the attacker and is unlikely to be completely prevented.
The dilemma of verification Code
Computer programs can run uninterrupted 24 hours a day, even at lower recognition rates, and can traverse the CAPTCHA system in a relatively short period of time. Therefore, the recognition rate of capacha needs less than 0.01% to effectively block the attack of automated malicious programs.
Of course, IP assistance can also be used to limit the number of attempts on a single machine.
The way to solve the verification code
However persuasive outsmart, only understand the verification code is how to crack, can design a better verification code.
Cracking the main process
1 Image capture: Grab HTML directly from HTTP, analyze the URL of the image, and download and save.
2 preprocessing: Detection is the correct image format, conversion to the appropriate format, compression, cut out roi, noise removal, grayscale, conversion of color space and other such actions.
3 detection: Mainly to find out the main area of the text.
4 Pre-processing: Do the cutting of text.
5 Training: Select and train the appropriate number of training sets through a variety of pattern recognition and machine learning algorithms.
6 Identification: Enter the processed picture to be recognized, convert it to the input format required by the classifier, and then determine the possible letter by the class and confidence of the output. Recognition is essentially a classification.
Some suggestions on the design of verification codes
1 in the use of noise and other types, try to make the character and confusion of the foreground and background is not easy to distinguish, try to make the bad (noise) and good People (letters).
2 particularly good verification code design, to try to play the human good and artificial intelligence algorithm is not good at. For example, the segmentation of the adhesion character and handwriting (through the printing of special deformation can also), and do not blindly add some seemingly complex noise or other fancy things, even if you do complex enough, but if people are difficult to identify, such a verification code is useless.
3 from the perspective of professional machine vision, the design of the verification code, must let the cracker in the identification phase, repeated in the low-order visual and high-level vision more than a few times to identify, so that can greatly reduce the difficulty of cracking and the accuracy of the crack.
?
Reprint--Verification code of yesterday, today and tomorrow