Http://www.pin5i.com/showtopic-22246.html
Describe the ideas and algorithms.
The full text is divided into two parts, the first part of the recognition of the license plate recognition and common method of identification, the second part of the similar QQ verification code, Gmail verification code of this kind of abnormal authentication code recognition methods and ideas.
First, the license plate/Verification Code identification common method
The common methods for identification of license plate and verification code are:
(1) Grayscale and two value of the image
(2) de-noising, then cut into a single character
(3) Extracting characteristics of each character, generating a feature vector or feature matrix
(4) Classification and learning. The feature vectors or feature matrices are compared with the sample library, and the similar samples are selected, and the values of such samples are used as the output results.
The following code describes the above process. Because the update SVN Server, I used to BDB stored code access, so some of the code is reflector back to compile, I hope forgive.
(1) Grayscale and two value of the image
The purpose of this is to turn each pixel of the picture into 0 or 255 so that it can be calculated. Part of the noise can also be removed.
Image grayscale and two value of the premise is BMP picture, if not, you need to convert to BMP picture first.
Speaking in code, my image grayscale code (algorithm is searched on the Internet):
- 1 protected static color Gray (color c)
- 2 {
- 3 int rgb = Convert.ToInt32 ((double) (((0.3 * C.R) + (0.59 * C.G)) + (0.11 * c.b));
- 4 return Color.FromArgb (RGB, RGB, RGB);
- 5}
- 6
Copy Code
By grayscale the image, each pixel becomes a grayscale value of 0-255.
The gray value of two is then converted to 0 or 255. The general approach is to set a range, for example, [A, b], the gray level between [A, a] is all changed to 255, the other becomes 0. I am using an adaptive binary algorithm that is popular on the internet.
- 1 public static void Binarizate (Bitmap map)
- 2 {
- 3 int TV = Computethresholdvalue (map);
- 4 int x = map. Width;
- 5 int y = map. Height;
- 6 for (int i = 0; i < x; i++)
- 7 {
- 8 for (int j = 0; J < y; J + +)
- 9 {
- if (map. GetPixel (i, J). R >= TV)
- 11 {
- Map. SetPixel (i, J, Color.FromArgb (0xFF, 0xFF, 0xff));
- 13}
- + Else
- 15 {
- Map. SetPixel (i, J, Color.FromArgb (0, 0, 0));
- 17}
- 18}
- 19}
- 20}
- 21st
- private static int Computethresholdvalue (Bitmap img)
- 23 {
- int i;
- int k;
- Csum Double;
- thresholdvalue int = 1;
- int[] ihist = new int[0x100];
- (i = 0; i < 0x100; i++)
- 30 {
- ihist = 0;
- 32}
- gmin int = 0xFF;
- Gmax int = 0;
- for (i = 1; i < (IMG). WIDTH-1); i++)
- 36 {
- The PNS for (int j = 1; J < (IMG). HEIGHT-1); J + +)
- 38 {
- The int cn = img. GetPixel (i, J). R
- ihist[cn]++;
- if (CN > Gmax)
- 42 {
- Gmax = CN;
- 44}
- if (CN < gmin)
- 46 {
- Gmin = CN;
- 48}
- 49}
- 50}
- Wuyi double sum = csum = 0.0;
- int n = 0;
- (k = 0; k <= 0xFF; k++)
- 54 {
- Sum + = k * Ihist[k];
- + + + + ihist[k];
- 57}
- if (n = = 0)
- 59 {
- return 60;
- 61}
- Double Fmax =-1.0;
- n1 int = 0;
- (k = 0; k < 0xFF; k++)
- 65 {
- N1 + = Ihist[k];
- if (n1! = 0)
- 68 {
- N2 int = n-n1;
- if (N2 = = 0)
- 71 {
- Thresholdvalue return;
- 73}
- Csum + = k * Ihist[k];
- M1 double = Csum/(double) n1);
- Double m2 = (sum-csum)/((double) n2);
- A double SB = ((n1 * n2) * (M1-M2)) * (M1-M2);
- if (SB > Fmax)
- 79 {
- Fmax = SB;
- Bayi Thresholdvalue = k;
- 82}
- 83}
- 84}
- Thresholdvalue return;
- 86}
- 87
- 88
Copy Code
Grayscale and two before the value of the picture:
Grayscale and two after the value of the picture:
Note: for license plate recognition, this algorithm is good. For Authenticode identification, you may need to design a special two-valued algorithm for specific Web sites to filter the noise.
(2) de-noising, then cut into a single character
Above this license plate cutting is relatively simple, from left to right scan, encounter empty big, click a knife, solved. But there are some license plates, like this one:
A simple scan will not work. Therefore, a more general denoising and cutting algorithm is needed. Here I am using a relatively simple method:
Consider the picture above as a plane. The picture is projected horizontally, so that the projection value of the place with the word is high, and the value of the projection from the place where there is no word is low. This will get a curve, like one hill after another. Here is my hand drawing:
Then, a scan line (s in) is scanned from the bottom up. This scan line will have an intersection with the curve in the graph, which divides the hill into one area after another. The license plate image is generally 7 characters, so when the scan line divides the hill into seven areas, it stops. Then the seven characters in the picture can be divided according to the coordinates of the projection of the seven regions to the horizontal line.
But the reality is complicated. For example, the "Chuan" word, its horizontal projection is three hills. Press the scan method above to cut it. Therefore, for the above cut, the constraints need to be added: Each hill has a central line, the distance between the hill and the center of the hill must be above a certain value, otherwise, you need to merge the two hills. With this constraint, it is possible to cut effectively.
The above is a horizontal projection. Then we need to do vertical projection and cutting. Here the vertical projection with cutting on a hill, so good to handle some.
The results of the cut are as follows:
Horizontal projection and Cutting code:
- 1 public static ilist<bitmap> Split (Bitmap map, int count)
- 2 {
- 3 if (count <= 0)
- 4 {
- 5 throw new ArgumentOutOfRangeException ("Count must be greater than 0.");
- 6}
- 7 ilist<bitmap> resultlist = new list<bitmap> ();
- 8 int x = map. Width;
- 9 int y = map. Height;
- Ten int splitbitmapminwidth = 4;
- int[] Xnormal = new Int[x];
- (int i = 0; i < x; i++)
- 13 {
- + for (int j = 0; J < y; J + +)
- 15 {
- if (map. GetPixel (i, J). R = = Chargrayvalue)
- 17 {
- xnormal++;
- 19}
- 20}
- 21}
- Pair pair = new pair ();
- (int i = 0; i < y; i++)
- 24 {
- ilist<pair> pairlist = new List<pair> (count + 1);
- (int j = 0; J < x; j + +)
- 27 {
- if (Xnormal[j] >= i)
- 29 {
- if ((j = = (x-1)) && (pair. Status = = Pairstatus.start))
- 31 {
- The pair. End = j;
- The pair. Status = Pairstatus.end;
- if (pair. End-pair. Start) >= splitbitmapminwidth)
- 35 {
- Pairlist.add (pair);
- 37}
- The pair = new pair ();
- 39}
- The-else if (pair. Status = = pairstatus.justcreated)
- 41 {
- A pair. Start = j;
- The pair. Status = Pairstatus.start;
- 44}
- 45}
- The + else if (pair. Status = = Pairstatus.start)
- 47 {
- The pair. End = j;
- The pair. Status = Pairstatus.end;
- if (pair. End-pair. Start) >= splitbitmapminwidth)
- 51 {
- Pairlist.add (pair);
- 53}
- The pair = new pair ();
- 55}
- if (Pairlist.count > Count)
- 57 {
- break;
- 59}
- 60}
- if (Pairlist.count = = Count)
- 62 {
- Pairlist foreach (Pair p in)
- 64 {
- P.width < (map. WIDTH/10))
- 66 {
- int width = (map. WIDTH/10)-p.width;
- P.start = Math.max (0, P.start-(WIDTH/2));
- P.end = math.min ((int) (P.end + (WIDTH/2)), (int) (map). WIDTH-1));
- 70}
- 71}
- Pairlist foreach (Pair p in)
- 73 {
- newmapwidth int = (P.end-p.start) + 1;
- Bitmap Newmap = new Bitmap (newmapwidth, y);
- * for (int ni = P.start; ni <= p.end; ni++)
- 77 {
- (int nj = 0; nj < y; nj++)
- 79 {
- Newmap.setpixel (Ni-p.start, NJ, map. GetPixel (NI, NJ));
- 81}
- 82}
- Resultlist.add (NEWMAP);
- 84}
- Resultlist return;
- 86}
- 87}
- Resultlist return;
- 89}
- 90
Copy Code
The pair in the code represents a pair of intersections of the scan line with the curve:
- 1 Private class Pair
- 2 {
- 3 public Pair ();
- 4 public int Charpixelcount {get; set;}
- 5 public int charpixelxdensity {get;}
- 6 public int End {get; Set }
- 7 public int Start {get; Set }
- 8 public bitmapconverter.pairstatus Status {get; Set }
- 9 public int Width {get;}
- 10}
- 11
Copy Code
Pairstatus represents the state of the pair. The specific state of what the meaning, I have forgotten.
- 1 Private enum Pairstatus
- 2 {
- 3 justcreated,
- 4 Start,
- 5 End
- 6}
- 7
Copy Code
The above code is hard to write because there are a lot of special cases to deal with. The pairstatus was also introduced for the handling of special cases.
The code for vertical projection and cutting is simple, not affixed, see the attached DLL's Bitmapconverter.trimheight method. The
above uses a simple method of denoising and cutting. Some images, especially CAPTCHA images, require special noise removal. How to do this is to open cximage (http://www.codeproject.com/KB/graphics/cximage.aspx), or Paint.NET, with the above image processing methods, to see whether effective denoising. Remember your own steps, and then turn over their source code to extract the algorithm. What more refinement ah, filtering ah, these processing can improve the quality of the picture. Refer to the ITK code or image processing book for details.
(3) extracts the characteristics of each character, generates a feature vector or feature matrix
splits the cut character into a small block, such as 3x3,5x5, or 3x5, or 10x8, and then counts the number of pixels per small block with a value of 255 , so that a matrix M is obtained, or the matrix is simplified to vector v.
with the above 3 steps, the characters in a license plate can be numerically converted into vectors. The
(1)-(3)-Step-specific code flow is as follows:
- 1
- 2 bitmapconverter.tograybmp (bitmap); Grayscale of pictures
- 3 bitmapconverter.binarizate (bitmap); Picture two value of
- 4 ilist<bitmap> maplist = Bitmapconverter.split (Bitmap, Defaultcharscount); Horizontal projection and cutting
- 5 Bitmap map0 = Bitmapconverter.trimheight (maplist[0], defaultheighttrimthresholdvalue); Vertical projection and cutting
- 6 Imagespliter spliter = new Imagespliter (MAP0);
- 7 Spliter. Widthsplitcount = Defaultwidthsplitcount;
- 8 Spliter. Heightsplitcount = Defaultheightsplitcount;
- 9 Spliter. Init ();
- 10
Copy Code
Then, through Spliter. ValueList can obtain a vector representation of the Bitmap map0.
(4) Classification
The principle of classification is very simple. Use (VIJ,CI) to represent a sample. Among them, Vij is the vector of the sample image after its numerical process. CI is a human eye to recognize this picture, given the results. Vij shows that there are multiple samples, their numerical vectors are different, but their results are CI. Assuming that the image to be identified is vectorized, the resulting vector is V '.
Intuitively, we will have the idea that this image is to be identified, most like a picture in a sample library, then we will treat it as that image, identifying it as the pre-specified character of the image in the sample library.
In our eyes, it's easy to tell if a picture is similar to another picture, but it's hard for a computer to judge. We have already converted the image into a vector of the same dimension, how does a computer determine that a vector is similar to another vector?
Here you need to calculate the distance between a vector and another vector. The shorter the distance, the more similar the two vectors are considered.
I use samplevector<t> to represent vectors:
- 1 public class Samplevector<t>
- 2 {
- 3 protected t[] Vector {get; set;}
- 4 public Int32 Dimension {get {return vector.length;}}
- 5 ...
- 6}
- 7
Copy Code
T represents a data type, can be Int32, or it can be a more precise type such as double.
The public interface for measuring distances is: IMetric
- 1 public Interface Imetric<telement,treturn>
- 2 {
- 3 Treturn Compute (samplevector<telement> v1, samplevector<telement> v2);
- 4}
- 5
Copy Code
Commonly used is the minkowskimetric.
- 1//<summary>
- 2//Minkowski measure.
- 3//</summary>
- 4 public class Minkowskimetric<telement>: Imetric<telement, double>
- 5 {
- 6 Public Int32 Scale {get; private set;}
- 7 Public Minkowskimetric (Int32 scale)
- 8 {scale = scale;}
- 9
- Ten public Double Compute (samplevector<telement> v1, samplevector<telement> v2)
- 11 {
- if (V1 = = NULL | | v2 = = NULL) throw new ArgumentNullException ();
- if (v1. Dimension! = v2. Dimension) throw new ArgumentException ("V1 and v2 dimensions vary.");
- Double result = 0;
- for (int i = 0; i < v1. Dimension; i++)
- 16 {
- Result + = Math.pow (Math.Abs (convert.todouble (v1)-convert.todouble (v2)), scale);
- 18}
- return Math.pow (result, 1.0/scale);
- 20}
- 21}
- 22
- Metricfactory is responsible for producing various dimensions of the minkowskimetric:
- 24
- public class Metricfactory
- 26 {
- public static imetric<telement, double> createminkowskimetric<telement> (Int32 scale)
- 28 {
- Return to new minkowskimetric<telement> (scale);
- 30}
- 31
- public static imetric<telement, double> createeuclideanmetric<telement> ()
- 33 {
- Return createminkowskimetric<telement> (2);
- 35}
- 36}
- 37
Copy Code
Minkowskimetric is a universally used measure. But not necessarily the most effective amount. Because it treats every point in vector v equally. In image recognition, the importance of each point is not the same, for example, the recognition of Q and O, features in the lower half, the lower part of the weight should be greater than the upper half. For these easily confusing characters, special measurement methods need to be designed. In license plate recognition, others are easily confused by D and 0,0 and O,i and 1. Minkowski metric recognize these characters, and the effect is poor. Therefore, special handling is required when these characters are encountered. As time was tight, I only used Minkowski Metric.
In my code, only the most recently implemented, whichever is selected. A better solution is to use K nearest neighbor classifier or neural network classifier. K Nearest neighbor principle is to find and to identify the picture (vector) distance from the nearest K samples, and then let the K sample using a certain rule calculation (vote), which category (C) of the new picture, the neural network will be measured process and voting decision process parameterization, so that it can be changed with the increase of the sample, Is such a kind of learning machine. Interested to see the "pattern Classification" of the Book of the third chapter and the fourth chapter.
The general idea of license plate recognition and Verification code recognition