15 minutes hack website Verification Code

Source: Internet
Author: User

Overview

Many developers hate the verification code of the website, especially the programmer who writes the web crawler, and the website set up the verification code, is to prevent the robot to visit the website, cause unnecessary loss. Well now, with the development of machine learning technology, the problem of machine identification verification code is better solved.

Sample Collection Tool

Here we use WordPress really simple CAPTCHA generate verification code plug-ins, the reason to choose this plug-in, one is its installation is very large, two is because it is open source, we can use it to generate a batch of verification code images.

Target Estimation

We learned through the demo site, really simple captcha generated is a picture containing 4 numbers or letters, read the source, the plug-in also blocked the O and I are two more easily confused letters, that is, there are 32 characters left, it seems can be completed. It took two minutes to present.

Dependent

We need to use the following tools and libraries.

    • Python3

    • OpenCV

    • Keras

    • TensorFlow

Create a Swatch set

To achieve this, we first prepare the sample set, which is as follows:

Using the really simple CAPTCHA plug-in source code, we are very convenient batch generation of 10,000 verification code images and corresponding results, after we have completed the generation, presumably as follows:

This place everyone can change according to their actual situation really simple CAPTCHA plug-in source code, to generate the sample set that you want. If you feel trouble, you can also download my generated good.

So far, we have spent five minutes.

How to train

Now that we have the sample set, we can train the neural network directly with the image and the corresponding results.

As long as we have enough samples, we can finally achieve the results we want.

But we can also use a better training method, which uses less sample data, but the result is much better than the direct training method, which I think you have guessed, and this method is to cut the four characters in the picture to form four samples. This method works because all CAPTCHA images are 4 characters long.

10000 pictures, a hand to use PS to cut, certainly not realistic, and because the picture of the horizontal arrangement is not equal spacing, the distance between the characters is inconsistent, manual cutting is certainly impossible.

。  

In fact, we just draw a rectangle, ensure that only the characters in the rectangle can be, and then cut out from the picture of such a rectangle, forming a picture of a single character sample. Fortunately, this operation OpenCV has been implemented for us, OPENCV has a function called findcontours (), you can follow the same color value of the region to cut the rectangle we want. -Prepare a picture first:

-Convert picture to black white. This is where the character is black, white, and easy to OPENCV cut.

-Next we cut the picture with OpenCV's findcontours function.

Next, we cut the picture from left to right, and stored the cut image, as well as the corresponding character of the picture. But the actual operation of the process, I found a problem, is sometimes two characters by too close, leading to OpenCV in the cutting, the two-character cutting knife a picture, such as:

The results of the cut are:

If we do not solve this problem, our sample set is not allowed, the training of the model can not be correct. My workaround is to first set a word justifies the largest pixel, if more than this pixel, it is considered that a picture contains two characters, and then we choose to cut the picture in half, divided into two characters. For example:


OK, we now have a CAPTCHA image corresponding to the 4 characters of the picture, now we have all the sample pictures are cut well, and then, the same character corresponding to the picture into a folder, so that the purpose is to try to find more than one character of multiple styles. The results are as follows:

So far, I have spent 10 minutes.

Training Model

Because we just recognize the numbers or letters that correspond to the images, we don't need a particularly complex neural network algorithm. The recognition character transmitting is much simpler to recognize a kitten puppy.

??

I use convolutional neural networks in this place, two convolutional layers and a fully-connected layers.

This place on convolutional neural network algorithm does not do a detailed introduction, interested students, can Google study.

?

After the training is complete, we need to test it. It takes 15 minutes to finish.

Summary

The whole process looks simple:-Download the captcha image from the WordPress website using the plugin mentioned above-cut the picture into a small picture containing a single character-train the model with a neural network algorithm-predict the corresponding character of the new CAPTCHA image

Here is my test:

Code

Https://pan.baidu.com/s/1o94k2k6

You can get complete code and sample pictures from here, you can refer to the Readme to run the relevant program.

15 minutes hack website Verification Code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.