Verification Code Technology

Source: Internet
Author: User

Verification Code yesterday, today, and tomorrow

Why use the verification code?

Without a verification code, attackers can use harmful programs to automatically register a large number of Web Service accounts, and then attackers can use these accounts to create problems for other users, such as sending spam or repeatedly logging on to multiple accounts at the same time to delay the service speed.
However, in most cases, the automatic registration program cannot properly recognize characters in images. Therefore, in order to prevent attackers from writing programs to automatically register or repeatedly log on to brute force password cracking, the verification code technology emerged.

Currently, many websites use the verification code technology to prevent users from automatically registering, logging on, And bumping through robots. The so-called verification code is to generate an image with a string of randomly generated numbers or symbols, and add some interference pixels to the image (preventing OCR). The user can identify the verification code information with the naked eye, enter a form to submit the website for verification. A function can be used only after the verification is successful.

What is a verification code?

CAPTCHA (Completely automation Public Turing test to tell Computers and Humans Apart) is translated as a "Completely Automated Turing test to distinguish Computers from Humans", as its name implies, it is used to distinguish computers from humans. In the CAPTCHA test, the server computer will automatically generate a question to be answered by the user. This question can be generated and judged by computers, but it must be answered by humans. Because the computer cannot answer CAPTCHA's questions, users who answer questions can be considered as humans. CAPTCHA is used by computers to test humans, not by humans in the standard Turing test. Therefore, CAPTCHA is sometimes called a reverse Turing test.

Currently, hundreds of millions of verification codes are identified by humans every day. Therefore, the demand for CAPTCHA is huge, and CAPTCHA needs to be automatically generated and correctly evaluated. In addition, humans must be able to quickly identify and enter the verification code, otherwise it is easy to annoy users so that users are lost. For CAPTCHA, difficulties in the AI field can be introduced to make existing technologies fail to be cracked in the short term. If CAPTCHA is not cracked, there is a way to distinguish humans from computers. If CAPTCHA is cracked, an AI problem will be solved.

Verification code type

Text Verification Code

Text verification codes are the most widely used technology, which is convenient for computers to automatically generate large numbers of codes. Text verification codes mainly rely on image deformation and Noise addition.

The main difficulty of text Verification Code cracking lies in character segmentation and recognition. Character segmentation is the key to cracking text verification codes. The main steps are as follows: Step 1: delimiter, Step 2: single character recognition. The recognition of a single character can be easily recognized by the existing machine learning algorithm.

Therefore, the key to preventing attacks on text verification codes is to increase the difficulty of character segmentation. Verification codes of companies like Google are stuck together, making it difficult to separate them.

Image Verification Code

Image verification codes are based on image classification, target recognition, and scenario understanding. Generally, they are more difficult to crack than text verification codes. However, the existing image verification codes require a large image database and cannot be generated on a large scale, even worse, once the database is published, the algorithms will not be cracked.

Voice verification code

The voice verification code is used to play randomly selected numbers and letters broadcast by one or more people at random intervals, and then add background noise. Voice verification codes are vulnerable to attacks by machine learning algorithms, and are less user-friendly than visual verification codes. For letters, a small number of people in rural areas may be unable to understand the English pronunciation and pass the test.

Verification code usage

The server generates a Random verification code string, stores it in the memory, writes the image, and sends it to the browser for display. The browser enters the verification code picture with the characters, and then submits it to the server, the submitted character is the same as the character saved on the server. If it is the same, it will continue. Otherwise, a prompt will be returned. The robot program compiled by attackers can hardly identify the characters of the Verification Code and successfully complete automatic registration and logon. Users can identify and enter the characters, so this can block attacks. The character recognition of the image is to see the interference intensity on the image. In practice, the Verification Code only increases the difficulty of attackers and cannot be completely prevented.

Verification Code dilemma

Computer programs can run 24 hours a day without interruption. Even with a low recognition rate, they can go through the CAPTCHA system in a short period of time. Therefore, the CAPACHA recognition rate must be lower than 0.01% to effectively block attacks by automated malicious programs.
Of course, you can also use IP addresses to limit the number of attempts on a machine.

How to crack the verification code

Only by understanding how the verification code is cracked can we design a better verification code.

Major CRACKING PROCESS

1. Image Collection: capture HTML directly through HTTP, analyze the image url, and download and save the image.

2 pre-processing: the detection is the correct image format, which is converted to the appropriate format, compressed, cut out the ROI, remove noise, grayscale, and convert the color space.

3. Check: The main area where the text is located.

4 Pre-processing: cut text.

5. Training: selects and trains a suitable number of training sets through various pattern recognition and machine learning algorithms.

6. recognition: the input image to be recognized is converted into the input format required by the classifier. Then, the output class and confidence level are used to determine which letter is probably used. Recognition is classification in essence.

Suggestions on verification code design

1. In terms of noise and other types of use, we try our best to make it difficult to distinguish characters from prospects and backgrounds for obfuscation, and try our best to make the bad guys look the same as good guys (letters.

2. The design of a particularly good verification code should try its best to give full play to what humans are good at but artificial intelligence algorithms are not good. For example, the separation and handwriting of adhesive characters (it is also possible to make special deformation through the print), rather than simply adding something that looks complicated or other fancy, even if you are complex enough, such a verification code is useless if it is hard to recognize.

3 from the perspective of professional machine vision, the design of the Verification Code must allow the attacker to identify the code several times in the identification stage, this greatly reduces the difficulty and accuracy of cracking.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.