C # basic verification code identification methods and source code

Source: Internet
Author: User
Tags 0xc0 scale image
Let's talk about the background of this writing.

A friend has been doing something recently and has already done a good job. Finally, I want to be more perfect, so I proposed to give this verification code to K. o. so k. o. the verification code. The recognition time for a single image is less than 200 ms, and the accuracy rate of manual statistics for 500 samples is 95%. Because I have no relevant experience, I cross the river by feeling the stones. In the spirit of experience sharing, share the overall analysis ideas. It's ugly to everyone.

Let's take a look at some of the recognition results.


 

Are you familiar with it?

Step 1: Remove background noise and binarization

Several methods are taken into account for this part.

Method 1: Calculate the color distribution of images, and the background noise is determined when the color share is low. Because the difference between background noise and foreground color is not obvious, we tried many methods to avoid background noise and finally gave up this method.

Method 2: I checked it online a little later. Recently, it is quite popular to calculate the gray scale and set a threshold value for binarization. In fact, the so-called gray-scale image principle is based on the human eye's sensitivity to color, this weight is meaningless for computers. If you think about it, you can find that the two processes can be fully merged. As a result, I completed background noise removal and binarization step by step. The threshold value is set to the sum of RGB values to 500. The results were extremely satisfactory.
 


 

Process step 2 create character Samples

Samples are very important for computers, because it is difficult for computers to have logical thinking. Even if they have logical thinking, they must undergo long-term training to satisfy you. Therefore, we need to compare the samples we have prepared in advance. If you have carefully observed these verification codes, you will find a bug. Almost all the verification codes use the same font, so you can manually create a set of font samples. The background noise can be directly removed from the previous step. The sample creation step is a bit simple and boring and requires careful attention. It may be because you are not careful, which may lead to a low recognition rate of a symbol. Only 31 characters are found in the 500 samples. Fortunately, a person in a department also takes into account error-prone characters, such as 1 and I, 0 and O. Otherwise, this department will carry more cursing.

Process step 3 matching

A single match uses the simplest and most primitive binary comparison, but the matching rate is not the matching number. I have defined related scoring principles. The general principle is that "this is a bonus. If there is no score reduction, you cannot have a moderate score reduction. The score is not counted if the score is out of the region ".

Because the matching results of some symbols in some regions are similar to those of other symbols, You need to select a single match in an extended region. Find an optimal match within a certain range. The optimal match is the symbol corresponding to the current position.

After an optimal match is completed, the matching position can be pushed to the right for a large step. If no suitable match is found, the matching position can be pushed to the right for a small step.

Process step 4 optimization and adjustment

Any algorithm needs to be optimized and adjusted. Now you need to find the best parameter configuration and the best code organization. This step usually takes the most time and effort.

Process step 5 verification results

In this step, the accuracy rate is calculated based on manpower verification results.

Thoughts

As a result, there are not many codes and the results are ideal. This line is often intended for general purposes. Whether it can be universal is largely at the abstract level. This method is only a simple match and cannot be used in general, but its methods and ideas are universal. Analyze specific cases. As for distorted text and hollow text, processing is much more complicated. There are some ways to use third-party image libraries on the internet, maybe those methods will be more common. If you are interested, continue with this topic.

Source code

As for whether the source code should be released or not, it has been difficult for a while. Similar Business activities already exist on the Internet, and the recognition itself is not too difficult. In addition to the inherent bug of a system, this verification code itself is equivalent to no setup. Therefore, this code is released, only for learning and communication.
 

Using system. Collections. Generic;
Using system. drawing;
Using system. IO;
Using system. Io. compression;
Namespace crack12306captcha
{
Public class cracker
{
List <charinfo> words _ = new list <charinfo> ();
Public cracker ()
{
VaR bytes = new byte [] {
0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0xc5, 0x58, 0xd9, 0x92, 0x13, 0x31,
0x0c, 0x94, 0x9e, 0x93, 0x0c, 0x61, 0x97, 0x2f, 0xe1, 0x58, 0xe0, 0x91, 0x9b, 0x82, 0x62, 0x0b,
0x58, 0xEE, 0xff, 0xff, 0x10, 0xd8, 0xcc, 0xc8, 0xea, 0x96, 0x6c, 0x8f, 0x13, 0x48, 0xe1, 0xaa,
0x4d, 0x46, 0x96, 0x6d, 0xb5, 0x8e, 0x96, 0x67, 0x73, 0x7f, 0x3b, 0x09, 0x0e, 0x25, 0x41, 0x49,
0xa3, 0xae, 0xd7, 0x5b, 0xa9, 0xa8, 0xd5, 0xb4, 0x76, 0x02, 0x6a, 0x5c, 0x52, 0x94, 0x54, 0xed,
0x18, 0x5a, 0x7f, 0x18, 0x00, 0x00, 0x84, 0x07, 0x1b, 0x80, 0x4a, 0x9a, 0x08, 0x35, 0xb8, 0x81,
0x50, 0xe7, 0xad, 0xbe, 0xc4, 0x8e, 0xb1, 0x4f, 0x2d, 0x5f, 0xba, 0x80, 0xbb, 0xfd, 0x9a, 0xad,
0x19, 0x36, 0xe5, 0xad, 0x87, 0xf1, 0x10, 0xc0, 0x8d, 0xc6, 0x50, 0x40, 0x52, 0xf8, 0xb3, 0x98,
0x2c, 0xd6, 0xec, 0x59, 0xe7, 0x0d, 0x3e, 0x0f, 0x93, 0x3e, 0x1d, 0x02, 0x7a, 0x18, 0x8f, 0xb6,
0xc7, 0x46, 0x4e, 0x01, 0xa3, 0x96, 0xdc, 0x3a, 0x20, 0x77, 0xbf, 0x2c, 0x24, 0xe4, 0x80, 0xa9,
0x20, 0x14, 0xe5, 0x2d, 0xb5, 0x68, 0xc9, 0x55, 0x89, 0x23, 0x96, 0x82, 0xaa, 0xba, 0x58, 0xa6,
0x03, 0x38, 0x71, 0x4b, 0x29, 0xd2, 0x47, 0x80, 0xe3, 0x84, 0x91, 0xf4, 0x78, 0x43, 0x64, 0x41,
0x7b, 0x73, 0x99, 0x80, 0x42, 0x48, 0x00, 0xde, 0x00, 0x12, 0x88, 0x80, 0xdb, 0x51, 0x4a, 0x49,
0x84, 0x43, 0xf6, 0x51, 0x90, 0x27, 0x21, 0xc9, 0xf8, 0xac, 0x00, 0x4d, 0xcd, 0x46, 0x09, 0x9d,
0x15, 0x78, 0xe0, 0x00, 0x1e, 0x44, 0x2a, 0x51, 0x8c, 0xbc, 0xd3, 0xa3, 0x68, 0x8a, 0xd5, 0x3a,
0x20, 0x79, 0xba, 0x4d, 0x71, 0x4c, 0x0b, 0x91, 0x98, 0x90, 0x7b, 0x2a, 0x42, 0xc5, 0x78, 0x7a,
0xfc, 0xd5, 0x1b, 0x4b, 0x09, 0xa7, 0x27, 0x99, 0x38, 0x05, 0x01, 0xc2, 0x80, 0x39, 0x9c, 0x67,
0xbb, 0x4e, 0x7f, 0x6c, 0x33, 0xdd, 0xed, 0x87, 0x55, 0xda, 0x5d, 0xb5, 0x56, 0x33, 0xc6, 0xf9,
0xea, 0x60, 0x64, 0xcf, 0xa7, 0x41, 0xe0, 0x5c, 0x1c, 0xc4, 0xb2, 0x25, 0xa3, 0x89, 0x88, 0x8d,
0x16, 0x00, 0xb5, 0xed, 0xa5, 0x22, 0x9d, 0x52, 0x41, 0x53, 0x8d, 0x92, 0x7f, 0x31, 0x51, 0x3f,
0xa8, 0x00, 0x85, 0x8a, 0x71, 0x10, 0x92, 0x78, 0xc4, 0x59, 0x08, 0x39, 0x69, 0xa9, 0x38, 0x41,
0x48, 0xf7, 0x40, 0x5a, 0x03, 0xd5, 0x3a, 0xf5, 0xe5, 0x9d, 0x33, 0x66, 0xc3, 0xd7, 0x1f, 0xef,
0x94, 0xa0, 0x53, 0xea, 0xf4, 0x15, 0xb2, 0x1c, 0x40, 0x2d, 0xcf, 0xaf, 0xce, 0xe9, 0xd4, 0x7a,
0x89, 0x09, 0xe6, 0xdd, 0xdb, 0x0e, 0xb8, 0x58, 0xa7, 0x60, 0x37, 0xfd, 0xf2, 0xfa, 0x2c, 0x4e, 0x51, 0x87, 0x0d, 0xfc, 0x16, 0x72, 0x2a, 0x5f, 0xc0, 0x80, 0xf0, 0x54, 0xa7, 0xde, 0xfc, 0x15,
0x8b, 0x9a, 0x36, 0x3a, 0x2c, 0x62, 0xfc, 0xd4, 0x8c, 0x31, 0xb7, 0xea, 0xd7, 0x26, 0xc4, 0xaf,
0x75, 0xea, 0xdb, 0x8b, 0xff, 0x9b, 0x9b, 0x50, 0x7e, 0xfe, 0x15, 0xab, 0x17, 0x2f, 0x96, 0x96,
0xbd, 0xaa, 0x87, 0xdd, 0x77, 0xa3, 0x77, 0xd3, 0x85, 0xf0, 0xe0, 0x58, 0xd5, 0xf6, 0x8c, 0xcd,
0xc4, 0x63, 0x52, 0x12, 0x48, 0x46, 0x0f, 0x93, 0x5a, 0xe3, 0xea, 0x24, 0x67, 0x73, 0x63, 0xa0,
0xdf, 0xdf, 0x3d, 0x67, 0xf6, 0xa9, 0xfc, 0xed, 0x08, 0xe3, 0x82, 0x57, 0x08, 0x35, 0x47, 0x68,
0x9c, 0x01, 0x40, 0x87, 0x8b, 0xbd, 0x0c, 0xb3, 0xf4, 0xe1, 0x72, 0xd7, 0x54, 0x62, 0xfd, 0x40,
0xed, 0x99, 0xa6, 0x7e, 0x2b, 0xe4, 0xb4, 0xc4, 0x62, 0x0d, 0x79, 0xae, 0x1b, 0xd7, 0xf4, 0x09,
0xb7, 0xe1, 0x7c, 0x44, 0x09, 0x9a, 0xda, 0xff, 0x52, 0x6a, 0x3c, 0xe1, 0xc8, 0xd7, 0xbd, 0xbb,
0xbe, 0x37, 0xfc, 0xd6, 0xd5, 0x4e, 0x3c, 0x40, 0x2a, 0x4b, 0x39, 0x1a, 0xbd, 0x2a, 0xcd, 0xc1,
0x18, 0x59, 0x40, 0x62, 0x78, 0xec, 0x63, 0x19, 0x72, 0xf0, 0xcf, 0xf8, 0x38, 0xfa, 0x42, 0x3a,
0xc8, 0x02, 0xec, 0x5b, 0xeb, 0x8d, 0xae, 0xf1, 0x45, 0xdd, 0x32, 0x98, 0x35, 0x3c, 0x9f, 0xa6,
0x3d, 0xce, 0x13, 0xce, 0x94, 0x38, 0x87, 0x00, 0x8d, 0x85, 0xc4, 0x70, 0x17, 0x26, 0x0e, 0xa6,
0x1e, 0x16, 0xcb, 0xbf, 0x52, 0xdf, 0x29, 0x63, 0xc4, 0xf6, 0x8c, 0x35, 0xba, 0xf2, 0xf9, 0x1f,
0xbf, 0x73, 0x1f, 0x91, 0x1b, 0x9e, 0x24, 0x5e, 0x63, 0x22, 0x82, 0x23, 0x05, 0x19, 0xb9, 0x71,
0x73, 0xdc, 0xcf, 0x05, 0x88, 0x94, 0x71, 0xdb, 0xdd, 0x48, 0x10, 0xd5, 0x55, 0xb3, 0x52, 0xc3,
0x1b, 0x01, 0x94, 0x13, 0x74, 0x94, 0x3a, 0x80, 0x2f, 0x39, 0xe2, 0x75, 0x0e, 0xf2, 0xc6, 0x18,
0xdc, 0x46, 0xfc, 0xf3, 0xea, 0x14, 0x80, 0xc1, 0xce, 0x24, 0xEE, 0x72, 0xed, 0x94, 0xaf, 0xfb,
0xa9, 0xaa, 0x4a, 0xe0, 0xd4, 0x22, 0xc6, 0xf0, 0x57, 0x1d, 0x8e, 0xd2, 0x90, 0xc6, 0x0c, 0xd3,
0x9a, 0x53, 0xfb, 0xd6, 0xb7, 0xdd, 0x14, 0xd4, 0xbd, 0x41, 0xa7, 0x80, 0x7b, 0x23, 0xfe, 0x34,
0x56, 0x0d, 0x96, 0x46, 0x02, 0xfe, 0xfd, 0xb2, 0x00, 0x5f, 0x01, 0x9c, 0xa0, 0x32, 0x39, 0xd7,
0x90, 0xc2, 0x6c, 0xc7, 0x4e, 0x68, 0x88, 0x7d, 0x9f, 0x9b, 0xcf, 0xa7, 0xbe, 0xa0, 0xfc, 0x18,
0x7d, 0x07, 0x5b, 0xa9, 0xbe, 0x56, 0x1f, 0x67, 0x1a, 0x4a, 0x91, 0x9c, 0x04, 0x38, 0x53, 0x6b,
0x70, 0x68, 0x8f, 0xea, 0xf4, 0x34, 0x87, 0x7f, 0x6e, 0x82, 0xc3, 0xc1, 0xab, 0x40, 0xc4, 0x50,
0x13, 0x0e, 0x33, 0x5d, 0x67, 0x7d, 0x01, 0x1f, 0xdb, 0xc0, 0x7f, 0xed, 0x87, 0x7f, 0xbc, 0x0f,
0x75, 0xe0, 0xa5, 0xba, 0xc0, 0x84, 0x3d, 0x24, 0x04, 0xe0, 0xf1, 0x16, 0x41, 0x3b, 0x74, 0xd2,
0x52, 0xc5, 0xf8, 0x7c, 0x12, 0xfb, 0xe4, 0x37, 0x5b, 0xfb, 0x57, 0x11, 0xa1, 0x18, 0x00, 0x00,
};
Using (VAR stream = new memorystream (bytes ))
Using (VAR gzip = new gzipstream (stream, compressionmode. Decompress ))
Using (VAR reader = new binaryreader (gzip ))
{
While (true)
{

Char CH = reader. readchar ();

If (CH = '\ 0 ')
Break;
Int width = reader. readbyte ();
Int Height = reader. readbyte ();
Bool [,] map = new bool [width, height];
For (INT I = 0; I <width; I ++)
For (Int J = 0; j Map [I, j] = reader. readboolean ();
Words _. Add (New charinfo (CH, MAP ));
}
}
}
Public String read (bitmap BMP)
{
VaR result = string. empty;
VaR width = BMP. width;
VaR Height = BMP. height;
VaR table = totable (BMP );
VaR next = searchnext (table,-1 );

While (next <width-7)
{
VaR matched = match (table, next );
If (matched. Rate & gt; 0.6)
{
Result + = matched. Char;
Next = matched. x + 10;
}
Else
{
Next + = 1;
}
}
Return result;
}
Private bool [,] totable (bitmap BMP)
{
VaR table = new bool [BMP. Width, BMP. Height];
For (INT I = 0; I <BMP. width; I ++)
For (Int J = 0; j <BMP. height; j ++)
{
VaR color = BMP. getpixel (I, j );
Table [I, j] = (color. R + color. G + color. B <500 );
}
Return table;
}
Private int searchnext (bool [,] Table, int start)
{
VaR width = table. getlength (0 );
VaR Height = table. getlength (1 );
For (start ++; Start <width; Start ++)
For (Int J = 0; j If (Table [start, J])
Return start;
Return start;
}
Private double fixedmatch (bool [,] source, bool [,] Target, int x0, int y0)
{
Double Total = 0;
Double COUNT = 0;
Int targetwidth = target. getlength (0 );
Int targetheight = target. getlength (1 );
Int sourcewidth = source. getlength (0 );
Int sourceheight = source. getlength (1 );
Int X, Y;
For (INT I = 0; I <targetwidth; I ++)
{
X = I + x0;
If (x <0 | x> = sourcewidth)
Continue;
For (Int J = 0; j <targetheight; j ++)
{

Y = J + y0;
If (Y <0 | Y> = sourceheight)
Continue;
If (target [I, j])
{
Total ++;
If (source [x, y])
Count ++;
Else
Count --;
}
Else if (source [x, y])
Count-= 0.55;
}
}
Return count/total;
}
Private matchedchar scopematch (bool [,] source, bool [,] Target, int start)
{
Int targetwidth = target. getlength (0 );
Int targetheight = target. getlength (1 );
Int sourcewidth = source. getlength (0 );
Int sourceheight = source. getlength (1 );
Double max = 0;
VaR matched = new matchedchar ();
For (INT I =-2; I <6; I ++)
For (Int J =-3; j <sourceheight-targetheight + 5; j ++)
{

Double rate = fixedmatch (source, target, I + start, J );
If (rate> MAX)
{
Max = rate;
Matched. x = I + start;
Matched. Y = J;
Matched. Rate = rate;
}
}
Return matched;
}

Private matchedchar match (bool [,] source, int start)
{
Matchedchar best = NULL;
Foreach (VAR info in words _)
{

VaR matched = scopematch (source, info. Table, start );
Matched. Char = info. Char;
If (best = NULL | best. rate <matched. Rate)
Best = matched;
}
Return best;
}
Private class charinfo
{
Public char {Get; private set ;}
Public bool [,] Table {Get; private set ;}
Public charinfo (char CH, bool [,] table)
{
Char = CH;
Table = table;
}
}
Private class matchedchar
{
Public int X {Get; set ;}
Public int y {Get; set ;}
Public char {Get; set ;}
Public double rate {Get; set ;}
}
}
}

Usage:

VaR cracker = new cracker ();
VaR result = cracker. Read (IMG );

C # basic verification code identification methods and source code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.