Instance description common Verification Code weakness and verification code recognition (1)

Source: Internet
Author: User
Tags svm

I. Introduction

Verification Code, as an auxiliary security means, has a special position in Web security. Verification code security seems to be insignificant compared with many web application vulnerabilities, but the treasure of a thousand miles is destroyed by the ant site. In some cases, if you can bypass the verification code, you can change the manual mode to automatic mode, which is helpful for Web security detection.

A Completely Automated Turing test that distinguishes Computers from Humans, commonly known as verification codes, it is a public, fully automated program that distinguishes users from computers and people. In the CAPTCHA test, the server computer will automatically generate a question to be answered by the user. This question can be generated and judged by computers, but it must be answered by humans. Because the computer cannot answer CAPTCHA's questions, users who answer questions can be considered as humans. (From wikipedia)

Most Verification Code designers do not know why verification codes are used or have no idea about how to verify the strength of the verification codes. When implementing most verification codes, you just need to print the text to a picture with a slightly complex background. programmers do not fundamentally understand the design concept of the verification codes.

Verification codes can be used in a variety of ways. First, we will introduce the simplest plain text verification codes.

Plain text Verification Code

Plain text. The output format is fixed and the quantity is limited. For example:

• 1 + 1 =?

• What is the domain name of this forum?

• What is today?

• Mathematical operation of complex points

This verification code does not conform to the Verification Code definition, because it can only be used for verification code if it is automatically generated. This text verification code is selected from the question bank and has a limited number. The method of cracking is also very simple. Refresh several times, create a question bank and corresponding answers, use regular expressions to capture problems from the web page, find matching answers, and then crack them. Some mathematical formulas are randomly generated, such as the random number [+-*/] Random Operator random number = ?, Programmers at the primary school level can do the same ......

This verification code is not useless. For a lot of spam bots that come along with a form, there is no need to work hard on a single website. This verification code is not the same for people who are eager to flood water on your website.

The focus of the verification code is the graphic verification code.

Graphic Verification Code

Let's talk about the basics:

Recognizing graphic verification codes is an important topic in computer science. It involves many advanced fields such as computer graphics, machine learning, machine vision, and artificial intelligence ......

Simply put, the main research content of computer graphics is to study the principles and algorithms related to how to represent graphics in computers, and how to use computers to calculate, process, and display graphics. A graph is usually composed of geometric elements such as points, lines, faces, and bodies, and non-geometric attributes such as gray scale, color, line type, and line width. Generally, the computer involves two-dimensional to n-dimensional graphics processing, border differentiation, area calculation, volume calculation, and distortion correction. Color Space calculation and conversion, graphic coloring, shadow, chromatic aberration processing, and so on.

The knowledge needed to crack the verification code is generally the processing and color difference analysis of basic 2D graphic elements such as pixels, lines, and surfaces. Common tools:

• SVM)

• OpenCV

• Image processing software (Photoshop, Gimp ...)

• Python Image Library

SVM is a classifier commonly used in machine learning. It can differentiate boundaries between images, but the background knowledge is too advanced.

OpenCV is a very common computer image processing and Machine Vision library. It is generally used for face recognition, tracking mobile objects, and so on. If you are interested in this aspect, you can study it.

PS, GIMP will not talk about it. If it is too much, it will be tears ......

Python Image Library is a graph processing Library in pyhon. It has powerful functions and is our first choice.

SVM Image Boundary Differentiation

Based on the SVM principle, data is mapped to a high-dimensional space, and then a super plane that can be split is searched.

To identify the verification code, you must make full use of the information in the image to separate the text and background of the Verification Code. For a typical jpeg image, each pixel can be placed in a 5-dimensional space, these five dimensions are X, Y, R, G, and B, which are the coordinates and colors of pixels. In computer graphics, there are many color spaces, the most common ones are RGB, CYMK for printing, and rare HSL or HSV. The dimensions of each color space are different, but they can be converted through formulas.

A cube composed of RGB color spaces. Each dimension represents a color.

HSL (color saturation) is a cone of the color space. For details, refer:

Https://zh.wikipedia.org/wiki/HSL%E5%92%8CHSV%E8%89%B2%E5%BD%A9%E7%A9%BA%E9%97%B4

After learning about the principles of the color space, you can use the formula applicable to the space to determine the color difference of pixels, for example, to determine the color difference between two points in the RGB space, you can use the two coordinates in the Three-dimensional space to calculate the distance formula:

Distance = sqrt [(r1-r2) ^ 2 + (g1-g2) ^ 2 + (b1-b2) ^ 2]

For more intuitive images, feel the following:

You can map each pixel of an image to an RGB color space to obtain a cube like this.

The color distribution of the image can be obtained through statistics and differentiation of the pixel colors. In the verification code, the pixels with the most approximate colors are generally the background, and the least are the interference points, the interference line and the text to be recognized.

If it is difficult to distinguish colors in RGB space, you can convert the color space to HSV or HSL:


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.