Verification code recognition and verification code recognition platform
It has been a long time since I wrote my blog last time. In fact, this verification code recognition module was written by myself as a lazy blog. Now I have to leave office and take it out. In general, this verification code is still difficult: the number of letters is not fixed, the location is not fixed, with a tilt angle, with adhesion, noise and interference lines. So the recognition rate is still relatively low. There is a tenth of it, but it can be identified. It is software recognition, not human recognition. This verification code recognition module is dedicated to the optimization of such verification codes, so if you want to identify other verification codes, you just need to slightly change the source code.
Source code
First look at the interface:
There are a lot of buttons on the interface. The order is actually the recognition order.
1. Obtain the verification code (dynamic retrieval from a website)
2. Go to the background
3. deinterference
4. binarization
5. Segmentation
6. Recognition
7. If an error is identified and a letter has a recognizable value, find the correct letter in the corresponding box in the order of the separated letters, and click "Learn. The next time you see similar letters, you can recognize them correctly.
For example, to identify the h error, you can enter h in the corresponding box and click "Learn. However, there is a black interference block in the upper left corner of the h letter, which has no learning value.
The following are some source code. You can click the link above to download the source code for viewing. Note that the Sample folder under Debug cannot be deleted. There are comments in the Code:
1. Background code:
Principle: consider the most colored part of the image as the background and replace it with white, which is equivalent to removing the background.
1 /// <summary> 2 /// remove background 3 /// select the most part of the color in the image as the background color and replace it with the white color. 4 /// </summary> 5 // <param name = "sender"> </param> 6 // <param name = "e"> </param> 7 private void btnDropBG_Click (object sender, eventArgs e) 8 {9 if (picbox. image = null) 10 {11 return; 12} 13 Bitmap img = new Bitmap (picbox. image); 14 // number of key Color value colors 15 Dictionary <Color, int> colorDic = new Dictionary <Color, int> (); 16 // obtain the number of each color in the image 17 for (var x = 0; x o. value ). firstOrDefault (). key; 42 // The least Color in the image 43 Color minColor = colorDic. orderBy (o => o. value ). firstOrDefault (). key; 44 45 Dictionary <int [], double> maxColorDifDic = new Dictionary <int [], double> (); 46 // search for the nearest color of maxColor 47 for (var x = 0; x o. value ). where (o => o. value <bjfz ). toArray (); 56 foreach (var kv in maxColorDifList) 57 {58 img. setPixel (kv. key [0], kv. key [1], Color. white); 59} 60 picbox. image = img; 61 pbNormal. image = picbox. image; 62}
2. Interfering with code,
Principle: if two or three sides of a pixel are significantly different from those of the top, bottom, and left pixels, the pixel is regarded as a noise or interference line. The median filtering algorithm is useful in online data, but the effect is poor.
1 /// <summary> 2 /// deinterfere with 3 /// </summary> 4 /// <param name = "sender"> </param> 5 // /<param name = "e"> </param> 6 private void btnDropDisturb_Click (object sender, eventArgs e) 7 {8 if (picbox. image = null) 9 {10 return; 11} 12 Bitmap img = new Bitmap (picbox. image); 13 byte [] p = new byte [9]; // minimum processing window 3*3 14 byte s; 15 // line 16 for (var x = 0; x 0 & y> 0 & x p [I]) 42 // {43 // s = p [j]; 44 // p [j] = p [I]; 45 // p [I] = s; 46 //} 47 //} 48 //} 49 // if (img. getPixel (x, y ). R <dgGrayValue) 50 // img. setPixel (x, y, Color. fromArgb (p [4], p [4], p [4]); // value 51 for valid values # endregion 52 53 // x y + 1 54 double upDif = GetColorDif (currentColor, img. getPixel (x, y + 1); 55 // bottom x Y-1 56 double downDif = GetColorDif (currentColor, img. getPixel (x, y-1); 57 // left X-1 y 58 double leftDif = GetColorDif (currentColor, img. getPixel (x-1, y); 59 // right x + 1 y 60 double rightDif = GetColorDif (currentColor, img. getPixel (x + 1, y); 61 // upper left 62 double upLeftDif = GetColorDif (currentColor, img. getPixel (x-1, y + 1); 63 // upper right 64 double upRightDif = GetColorDif (currentColor, img. getPixel (x + 1, y + 1); 65 // 66 double downLeftDif = GetColorDif (currentColor, img. getPixel (x-1, y-1); 67 // 68 double downRightDif = GetColorDif (currentColor, img. getPixel (x + 1, y-1 )); 69 70 /// 71 with a large color difference on all sides // if (upDif> threshold & downDif> threshold & leftDif> threshold & rightDif> threshold) 72 // {73 // img. setPixel (x, y, Color. white); 74 //} 75 // large color difference on three sides 76 if (upDif> threshold & downDif> threshold & leftDif> threshold) 77 | (downDif> threshold & leftDif> threshold & rightDif> threshold) 78 | (upDif> threshold & leftDif> threshold & rightDif> threshold) 79 | (upDif> threshold & downDif> threshold & rightDif> threshold) 80 {81 img. setPixel (x, y, Color. white); 82} 83 84 List <int []> xLine = new List <int []> (); 85 // principle of transverse interference line. If there are many white pixels above and below this point, it is considered as interference 86 for (var x1 = x + 1; x1 <x + 10; x1 ++) 87 {88 if (x1> = img. width) 89 {90 break; 91} 92 93 if (img. getPixel (x1, y + 1 ). toArgb () = Color. white. toArgb () 94 & img. getPixel (x1, y-1 ). toArgb () = Color. white. toArgb () 95 {96 xLine. add (new int [] {x1, y}); 97} 98} 99 if (xLine. count ()> = 4) 100 {101 foreach (var xpoint in xLine) 102 {103 img. setPixel (xpoint [0], xpoint [1], Color. white); 104} 105} 106 107 // remove vertical interference line 108 109} 110} 111} 112 picbox. image = img; 113 pbNormal. image = picbox. image; 114}
3. binarization code
Principle: This is simple, that is, the brightness is greater than a threshold value is set to white or black.
1 /// <summary> 2 // The brightness of each vertices in binarization traversal is smaller than the threshold value, and the brightness is regarded as black. Otherwise, the brightness is white. 3 /// </summary> 4 private void EZH () 5 {6 if (picbox. image! = Null) 7 {8 var img = new Bitmap (picbox. image); 9 for (var x = 0; x
4. There are too many code for segmentation and recognition. Let's take a look at the source code. Here we will talk about the principle:
The process of identification is to separate each letter and compare it with the image in the Sample folder to find out the one with the highest degree of overlap in the black area. Where does the image in the Sample folder come from? Do you still remember the learning mentioned above? Yes, that is, if the letters are relatively regular after manual segmentation, they can be considered as learning characters after there is no interference point or line. After clicking learning, it will be automatically saved to the corresponding Character Directory as a comparison image. A little artificial intelligence.
Let's add some more. segmentation is actually quite troublesome. The main difficulty is that interference lines, noise, and adhesion affect the accuracy of segmentation, especially adhesion. Therefore, the Code regards the boundary between the black pixel and the white pixel as the split point. Of course, the upper, lower, and lower sides should be checked. If the black pixel is too small, move the boundary. In addition, if the distance between the ending point and the starting point is obviously two characters, it is regarded as adhesion and separated from the middle, which is not very good, but we hope to recommend a better method.
Therefore, if you know how to identify a verification code to produce a verification code that is hard to recognize, noise and interference lines alone will not work, you can see that Google's verification code only uses adhesion and deformation to defeat the professional identification software known as cracking Verification Code 90%.
Finally, let's talk about the verification code recognition based on our own ideas and references. There are not many complicated algorithms, so it should not be difficult for most programmers. I hope you will think more and provide more valuable comments.
Verification Code Recognition Technology
The verification code can be generated by using the PHP Program (set to randomly extract numbers and letters). When the code is generated, the program knows the generated content. After the user inputs the same content, the program compares the generated values recorded earlier. If the values are consistent, the verification passes. This is the logic I understand. No matter what form of verification code image, there is a simple language information behind it. It is difficult to recognize images, especially complex images. However, if you can capture the generated values of program records (such as database captures), the image recognition problem is avoided. This is just my idea and may still be difficult.
Verification Code Recognition Technology
The verification code can be generated by using the PHP Program (set to randomly extract numbers and letters). When the code is generated, the program knows the generated content. After the user inputs the same content, the program compares the generated values recorded earlier. If the values are consistent, the verification passes. This is the logic I understand. No matter what form of verification code image, there is a simple language information behind it. It is difficult to recognize images, especially complex images. However, if you can capture the generated values of program records (such as database captures), the image recognition problem is avoided. This is just my idea and may still be difficult.