Graduation design did a simple research under the Verification Code identification problem, and no in-depth research, design graphics images of things, water is very deep, neural network, machine learning, are difficult. This time it was only analyzed in the traditional way.
This year's work has not been tidied up, a few days ago a guy wanted this demo to see, I put a bunch of things collected, packed to him, he was too busy, I will tidy up the record. This is the last homework of the university, there are a lot of memory and nostalgia.
The purpose of this demo is not to identify the verification code, is to verify the image processing methods used in other aspects, tickets, bills and so on.
This concludes with a case of invoice number identification:
Address: http://v.youku.com/v_show/id_XMTI1MzUxNDY3Ng==.html
Demo contains a verification code identification process of the demonstration program, an automatic Identification Tool class library, there is an invoice identification of the demo program
The use of 7 Web site of the graphical verification code as a case, of course, is targeted to avoid adhesion, twisted too strong:
The final recognition rate:
- In the verification code image processing process, involves the verification code generation, the grayscale processing, the background color removal, the noise processing, the binary process, the picture character segmentation, the picture normalization, the picture characteristic code generation and so on steps;
There are three main types of grayscale processing methods:
- Maximum Value method: The process is to find the maximum value of RGB three values per pixel, and then use that value as the
- Averaging method: The method selected Gray value equals the RGB value of each point added to the average
- Weighted average: The human eye does not perceive the RGB color in the same way, so the conversion needs to be given three different weights.
Background removal
The process is to turn the background into pure white, that is, as far as possible, the color outside the target character into white. The most difficult part of the stage is to determine the background and foreground of the image of the segmentation point, that is the threshold value. Because the value of the R value of each pixel in this picture (the same as the RGB values of the grayscale image) is changed to 255 (white), the point RGB value is greater than the threshold value. And this tipping point is constant throughout the process.
Can distinguish between the foreground and the background, the difference between the foreground and the background is the most obvious, like a layer of glass, the river is divided into two parts, the following precipitation, relatively cloudy, clear above, so, the two parts of the difference is quite obvious. The location of this piece of glass is key.
An algorithm for determining the threshold of common critical points
- Shuangfeng method, this algorithm is very simple, assuming that the picture is only divided into the foreground and background two parts, so in the gray-scale distribution histogram, these two parts will form a peak, and two peaks between the trough is the foreground background of the picture threshold. This algorithm has limitations, if the picture has three or more main colors, it will form a number of peaks, not good to determine the target valley, especially the verification code, a variety of colors, grayscale will also present different levels of gray image. Therefore, this procedure does not use this algorithm.
- Iterative method, the algorithm is to calculate the maximum gray level and the minimum gray level of the image, take its average as the threshold value, and then use the threshold to divide the picture into the foreground and the background two parts, in the calculation of the average gray level of the two parts, take the average as the second threshold, iterative, until the threshold value and the previous threshold value is equal, The target threshold is obtained.
- The maximal inter-class variance method, abbreviated as Otsu, is an adaptive threshold determination method, which is divided into background and target 2 according to the grayscale characteristics of the image. The greater the inter-class variance between the background and the target, the greater the difference between the 2 parts of the image, and the smaller the 2-part difference when the partial target is divided into the background or the partial background is wrong. Therefore, the division of the maximum variance between classes means that the probability of error is minimized. The goal of this method is to find the threshold of the most eligible segmentation background and target. This procedure is also used in the background separation of the algorithm.
- Gray-scale stretching algorithm, this is an improvement of the Otsu, because in the foreground background difference is not big, the separation effect of otsu will drop, at this time need to increase the foreground background difference, this is as long as the original gray scale at the same time multiplied by a factor, thereby enlarging the level of gray scale.
Noise judgment and removal
The first is to remove the border, some verification code in the picture border draw a black border, according to the principle of the background of the border is not removed. The removal of this border is very simple, the load into a two-dimensional array of each pixel to judge, if the horizontal axis of the point is equal to 0 or the width of the picture minus one, or the total coordinate is equal to 0 or the ordinate is equal to the picture height minus one, its position is the border position. Direct RGB 0 removes the border.
For a non-border point, it is the point at which the target pixel is judged not by the direct target point, but by the spot around it. At this point as the center of the nine Gongge, that is, the target point around 8 pixels, the calculation of these 8 points is not the back spot (that is, the number of white) points, if the value is greater than the given defined value (the number of noise and no code picture, noise adhesion are related, can not be dynamically obtained, only according to the results of It is indicated that the target point is the probability of a pixel in a character, the ancient pity dorado can not be used as noise, otherwise it is treated as a noise point. Assuming that the defined value is 2, then:
Binary Value
Binary is different from grayscale, grayscale processing of the picture, the RGB value of each pixel is the same, between 0-255, but the binary requires that the RGB value of each pixel is not 0 is 255. Make the picture completely black and white.
The binary process is to remove the noise after the verification code image of each pixel to process, if the point of R value is not 255, then the point of the RGB value is changed to 0 (pure black), so that the whole process down, the picture becomes a real black and white picture.
The main algorithm of image segmentation
Image segmentation Technology in the graphics image processing occupies a very important position, the picture is a complex information transmission medium, the corresponding, not every picture of all the information is expected to, because the times, in the picture "screen" the image of the target area is very important, which is used in the image segmentation technology.
The segmentation of picture characters is one of the most difficult steps in the verification code recognition, and it is also a step to determine the recognition result. No matter how complex the verification code can be accurately cut out, it can be recognized. There are many different ways of segmentation, and the fine processing after segmentation is also complex and varied.
Here are a few of the mature segmentation algorithms:
- Based on threshold segmentation, this segmentation method has been used in background processing, to find the boundary value between the target image area and the non-target image area in a certain way, and then achieve the purpose of separating two regions, this way does not achieve the effect of distinguishing each character, so it is not used in the segmentation stage.
- Projection segmentation, also known as region-based segmentation, this segmentation algorithm is also very simple, is the two value of the picture in the x-axis direction of the pixel distribution of the projection, in the peak and valley changes can be located in each target area, and then a single area of the y-axis projection, and then determine the regional location. This method has a certain effect on the slight adhesion, however, the noise will also produce excessive segmentation, and the ' 7 ', ' T ', ' L ' and so on will produce a segmentation error, so the program does not use this algorithm.
Fig. 3-7 Projection method
- Edge detection segmentation, also called point scanning method, this segmentation method can meet the requirements of the program to a certain extent, therefore, the program is also used this segmentation algorithm, will be described in detail later.
- Clustering, clustering, image segmentation is the image space of the pixels with corresponding feature space points, according to their feature space in the aggregation of the feature space segmentation, and then map them back to the original image space, to obtain the segmentation results. This approach is complex, but it has good effect on complex image processing such as adhesion and deformation. Due to the limited time, this project does not carry out in-depth analysis of this approach.
3.6.2 Edge Detection Segmentation algorithm
The program uses the edge detection method to determine the boundaries of each character. The procedure for this algorithm is as follows:
Figure 3-8 Image Segmentation
As you can see, when the program determines the boundary of the character "6":
- From the scan pointer from the leftmost pixel of the picture x axis coordinates of 0, scan down, scan width of 1px, if hit the pixel point R value is 0, note that at this time x coordinate a, if the scan to the bottom is not encountered, then move from the pointer to the right one bit, continue to scan the vertical until a.
- The scanning pointer starts from a+1, scans each pixel vertically, encounters the R value is 255, the variable K (initial value 0) increases by one, scans to the bottom to determine the value of K, if the K value equals the height of the picture, stops the subsequent scan, makes a note of the x-axis coordinate b at this point, or moves the pointer to the right,
- In the X interval (a,b-1), the pointer is scanned 0 points from the Y coordinate, judging the R value of each point, and if the R value equals 0, stop the scan and note the Y coordinate C at this point. Otherwise, point to move down one unit and continue to scan horizontally
- In the X interval (a,b-1), the pointer begins a transverse sweep from the c+1, judging the R value of each pixel, if the R value equals 255, the n (initial value 0) is increased by one, and if the value equals B-a, the Stop scan writes down the Y coordinate d at this point, or the pointer moves down, Continue to scan horizontally, knowing get d.
The "4" is also the same as the bounds of the character, only the beginning of the scan in step one position x coordinate 0 becomes b+1.
Every time you judge b-a, if his value is less than the width of the smallest of your verification code character, (assuming this is 4), then stop looking for the boundary to add the coordinates to the set.
such as the school's verification code character, the width of the narrowest is 1, but its width is greater than 4 so the setting is not a problem, according to the situation, the general width of less than 4, the verification code is very small, not conducive to people to see.
After the above process is finished, we get the horizontal axis of the four boundary points, the vertical axis, i.e. (a,b-1,c, D-1); draw a small picture of the area corresponding to the original verification code of the four points. The small picture is then normalized in accordance with the set aspect, and the processed picture is returned in the collection. Wait for the next step to process.
Pre-split: Four effects after splitting:
Special processing after splitting
In this process, because of the image of the partial adhesion, often the result of segmentation will not achieve the desired results, the segmentation of the small picture is also strange. However, considering that most of the current Web site verification code characters are 4, which means that the cut out of the small picture has to be four, in response to this situation, I did further processing, first of all, look at what may occur after cutting:
This verification code is a two value processed verification code, it is obvious that the first and second characters are mutually adhesion, the use of the program cut out of the picture should be 3 small pictures, similar to this:
Obviously, the ① is not the case that the program wants, in this case, that is, the first cut is 3 parts, find the widest one, and then chop open from among. Get 4 parts of the picture.
Accordingly, there are 2 parts of the time:
This is not our ideal situation, but also the same truth, the two parts in the middle of the chopping open, get 4 small pictures.
In this case, the first cut is completely a single piece:
We just have to divide it by 4 points.
Of course, the above processing will cause the corresponding error, but as long as the number of large enough, so the cutting effect is still possible.
This time only 4 characters of the case to do a special treatment, the other number did not do, concrete practice will be introduced in the summary.
Font production
The process is to transform the cut-off image into a feature matrix, which is the collection of small images returned during the cutting process to obtain the eigenvalues. In the picture cutting process, the program has to cut the small picture has been normalized processing, that is, the length and width are the same, traversing each pixel, if the point R value is 255, then record a 0, if the point of R value is 255, then record a 1, so in order, recorded good 0,1 to spell string, This string is the signature of the image. Then the front of the image to spell the corresponding character, with '--' connection. In this way, a picture has a eigenvalues string corresponding to the character string written to the text or database, the basic font library is established. Since the picture is normalized, the small chart specification is 20*30, so, each font data is 20*30+3+2 (carriage return line) = 605 characters.
The larger the amount of the font library, the higher the recognition accuracy, but not the bigger the better, the more the font data, the more time it consumes, the more efficiency will be reduced. Here is a partial pattern of a font library:
Verification Code Identification
To identify the verification code, you must have a well-made database of fonts, and then perform the following procedure at a time:
- The verification code picture obtains, this step verification code source may be obtains the verification code from the network stream, also may load the picture from the disk.
- Image processing, including grayscale, de-noising, go to background, binary, character segmentation, picture normalization, image feature code acquisition.
3. Calculating similarity, reading the matrix data in the matrix database, comparing the characteristic code of the normalized small graph with the font data, and calculating the similarity, and recording the character c corresponding to the most similar font data item.
4. Recognition results, the resulting character C is stitched together, the resulting string is the identification result of the verification code.
Here are the specific procedures for verifying code identification:
Font Library Maintenance
Verification code identification process has been detailed analysis, identification of key points in a cut, a quality in the font library. Font library involves two questions, one is the repetition of the problem, one is the matrix data. This phase is mainly achieved by:
- Filter culling of duplicate font data.
- For inserting the wrong font, you can modify it.
- You can delete unwanted font data.
Design of image processing class
The image processing class is designed according to the object-oriented thought, encapsulates the method used in the image processing process, and makes the parameter default value and variable parameter setting, method overloading for the common parameter values. The class is a static class that is convenient for developers to call, where Boundry is the class that stores the border information for small images with four boundary value attributes.
Developers can directly call the Getyzmcode () method to identify the verification code, which is an overloaded method, the rest of the methods will be described in the following specific implementation of the design of the method, the following is the class diagram represents the main processing method and the relationship between the Imageprocess class:
Invoice number Identification
This is based on the aforge.net implementation, referring to a foreign card recognition code.
The process is to determine the location of the invoice, then locate the invoice number, cut out the invoice number, call the automatic identification of the class library recognition number, and then write the recognition data to the screen. Of course, we should also implement training matrix;
This demo process is still more interesting, thanks to active in the blog Park, Csdn,github, open source China Strackoverflow and other community predecessors, they share in the open source community, dedication to make more developers benefit, on their shoulders, we these rookie can go farther.
Finally attach the source code:
Https://github.com/ccccccmd/ReCapcha
The specific cases are in the source code.
Verification code identification, invoice number identification (GO)