Text Detection Learning notes--boundary clustering, stroke segmentation, statement paragraph classification

Source: Internet
Author: User
Tags svm

Time: 2014-02-28-10:24:39

2.21 ~2.27 This week reading a relatively long document

Localizing Text in Scene Images by boundary clustering, Stroke segmentation, and String Fragment classification

by Chucai Yi

1. Talk

From the title localizing text, you can see that the article is mainly in a job ——— text positioning. Text positioning, can also be said to be text detection, is a natural language processing in a field, that is, my graduation design of the main direction. Speaking of Word detection, the first word processing, literally look to do is to do the life of all kinds of text through the computer program to deal with, for a certain purpose, word processing examples of many, such as the comparison of the fire Sogou photography translation function, just to the text camera, you can automatically recognize the text and then translate, Another example is the license plate number recognition system. How to do word processing? Generally used for computer processing of the data type is a picture, that is, word processing is to find a picture of the text, give him to do positioning, extract, and then identify. Text detection is obviously the first step in word processing, the process of finding the text position. And word detection is divided into many kinds, simple like print document image text detection, the reason is simple, because the Print document font specification, black for text white as the background, a simple color as a criterion to extract the text, and then identify, More difficult, such as the text detection in the video and have a complex background image of the text detection, because there is no simple criteria to directly locate the text, and need a combination of various algorithms, and my job is from a complex background image to do text detection. such as the following.

The picture contains the window, the leaf poles and other sundries, the color is also various, can not be pure color or other parameters as standard to extract text. So in this context, how to do the positioning work, there are many people in this field of research in this direction, the algorithm has a lot of, but most can not exactly 100% to solve the problem accurately, so the better algorithm needs to be kicked out, which is also the significance of the study.

2. Article subject

This article is called "Text detection of natural scene images based on (1) Boundary clustering, (2) Stroke segmentation and (3) sentence fragment classification." A natural scene picture is a complex background image. Maybe you don't know what you're talking about. (1) (2) (3) What is being said, such words can make people do not understand, but can be easily understood as the three processing process is to make a natural scene picture is not the text of the part is step-by-step to get rid of the final text position process. Let's talk about these three steps:

(1) Boundary clustering

The word boundary clustering can be divided into two parts: boundary, cluster. Clustering is also clustering, the basic elements have the same properties together, here the basic element is the image pixels, the nature of what is not known, but certainly with the boundary; Besides, the boundary refers to the edge region, the edge, the contour of the object, that is, the edge, From the angle of the pixel is the color of the image in the mutant part of those pixels, that is, the gray scale of the pixel changes, high gradient. In the above picture, the corners of the house are bounded by the outline of the text. So how does this article be classified according to the boundaries? In fact, there is a clear distinction between text boundaries and the boundaries of other objects, where the boundary of the text is a stroke (stroke), the outer surface is attached (attachment surface), the Strokes and attachments have their own color, and the color is basically consistent across a contiguous area, And for other objects, such as window leaves and other shapes of irregular-colored objects do not have this property. So we classify the pixels in the whole picture according to the different colors in the area on both sides of the boundary, but the color of the two sides is consistent with each other. In addition, the picture in the different positions of the text generally does not belong to a letter, it is said that the border in the picture of the approximate position is one of the basis of our classification, the general picture of the text are horizontal (for testing the picture is), so the horizontal axis is not the same but the vertical axis we also as the position of the text, So for each boundary pixel, the feature used for classification includes the color pairs (COLOR1,COLOR2) on both sides of the boundary and the ordinate (y), eigenvector x={color1,color2,y}.

In the classification process, the first step is to find all the boundary pixels (edge pixel), the second step is to classify these boundary pixels.

First step boundary detection

Here, the method used in the article is that theCanny edge detector (Canny edge detector), Canny detector to fully understand the trouble, there is a chance will be specifically supplemented, but we just need to know how he works, The input to the canny detector is an RGB image, the output is a two-value graph, the foreground color is the boundary pixel (set to P), the other part is the background color, that is, he can give the location of all the boundary pixels.

Second Step category

First, we need to use the K-means Clustering algorithm , the first classification of these boundary pixels, K-means clustering based on the Eigenvector x={color1,color2,y}, in the European space, each boundary pixel location is located, And then according to the Euclidean distance between these points to divide them into K class, each class of eigenvector x value is relatively close, so each class of characteristics are more similar, the characteristics of similar description is likely to be the same text (or object) boundary, thus completed the classification.

Only with K-means clustering we have actually completed the classification work, but for the accuracy and robustness of the classification, the Gaussian mixture model (GMM) is established to optimize the classification, and we know that the picture is generally divided into many parts (boundary pixel class). The previous step of the classification has been roughly divided into several parts of the picture, but the specific boundary pixel is not bound to it in that boundary pixel class, K-means we only use the Euclidean distance is absolutely considered to belong to a certain boundary pixel class, So, actually, it would be more appropriate to define it with a probabilistic model. For each part (boundary pixel class), the farther away the eigenvector x of a pixel is from the center point (mean) of this part, the lower the probability, the nearer the center point of this part is, the higher the probability of belonging to this part, which coincides with the Gaussian distribution. So any pixel has a feature X (p) and the probability of X (p) under which distribution is higher, then we can assume that the pixel p is in that boundary pixel class. For the entire graph, we use the mixed Gaussian model GMM to define it.

Based on Gaussian mixture model, we can use em algorithm to iteratively solve the most accurate parameters, and then classify the boundary pixels according to the probability after finding the most accurate parameters.

So the main task of the boundary clustering is to divide the whole graph into a few parts, each text and object is part, in the form of boundary pixels, but other non-text objects are not removed, as shown.

(2) Stroke segmentation

After getting the boundary pixel layer in the first part, actually we are still very far from the text, the boundary pixel layer is just some of the boundary pixels together, just a contour, they may be a text, but only have the outline we can not do the follow-up work, we want from this contour, the stroke to find and fill in the color, The same color is connected to a connected domain, in which a color is assigned the process of coloring assignment, the pixels around these boundary pixels (within a certain range is not the edge but close to the edge, so that the strokes will be filled with color) to do color assignment, other areas regardless, The color values are determined by the Color1,color2 in the mean eigenvector X of the region, which are either assigned to Color1 or given color2, so as we thought before, the pixels around the text boundary are very clear, the outer attachment surface is a color, Inside the stroke is another color, in this allocation process, the text around the boundary of the pixel is divided into two categories is the attachment surface, a class of strokes, while the text inside is also filled with color. However, non-verbal objects will also carry out such a process, which will bring some problems. There are two ways to solve this problem.

One is DCA, the direct color assignment. The text is not text to do color assignment, non-text object even if it is connected to the domain, become possible candidate text, but it does not conform to the characteristics of the statement, in the subsequent process will be excluded.

Second, the ICA, the first to find the boundary layer containing the characteristics of the stroke, with the stroke characteristics, only to do color assignment, no stroke characteristics of non-text objects, directly not considered. ICA used a horizontal vertical stroke detector to detect stroke characteristics, using the same gradient of pixels on both sides of the stroke and the opposite gradient direction.

Then in this part of the stroke segmentation, the previous boundary pixels in the pixel layer around the boundary pixels are attached to the color, either the attachment surface color or the text color, the two colors of the pixels are connected into a connected domain, have become possible candidate text statements, in the subsequent process to exclude. If the ICA algorithm is used, the interference of some non-text objects is removed in advance.

Given the possible candidate text statements, the next task is to make further exclusions by the positional features between the statements.

(3) Statement classifier

First Step layout analysis

For the previous step to get the text candidate connected domain, first as far as possible to connect a plurality of text candidate connected to a statement fragment, one of the methods of connecting to a statement fragment is adjacent character grouping, this algorithm in another article is described in detail, This algorithm considers that the statement features include the horizontal alignment of the adjacent text, the appropriate distance between the text, the text size is similar, the text color is similar, and so on. If they conform to the characteristics of the statement, then to be retained, otherwise deleted, the article also proposed the Two-component grouping algorithm specifically for only two letter statements (such as IS), constraints include the following aspects:

    1. One of the centers should be within another level, Y '-HEIGHT/6

    2. Height ratio greater than 0.83 less than 1.2, width ratio greater than 0.5 less than 2

3. No more than twice times wider CC, and no less than 12, the CC that meets the requirements is also treated as a statement fragment

Both algorithms are used to synthesize large text into statements, and small text (the limit of the size of the text is a pixel height of 12) is not a good way, and the small text is often connected to a whole connected domain, so the article is the height of less than 12 and the aspect ratio greater than 4 of the connected domain directly as a statement.

Layout analysis is the combination of these text candidates into a statement, the process, most of the background clutter because it does not satisfy the characteristics of the statement is removed. But the obtained statement is only a rough estimate, there is no guarantee that we get the correct statement, then we finally use a machine learning process to generate such a classifier, the remaining candidate statements to make a discriminant, here the decision mechanism is a SVM-based classifier, then the classifier first has a training process.

Acquisition of the second step training set

The database in the image after the previous processing, the final is really a positive sample of the text, the excluded fragments of the sentence as a counter-sample, and then add their own artificial to obtain some positive and negative samples for the SVM classifier Learning, since it is learning, the best input samples have a unified form, the fragments of these statements , that is, the image patch normalized to a uniform size, while the positive sample is unified into a white foreground, black background.

The third step is to get the training feature vector

SVM Classifier Training In addition to the need for a sample, but also need a multidimensional feature vector, each sample has its own eigenvector, character vectors have similarity, so can be classified as a class, in order to find these characteristics, we need to use the Gabor filter to obtain the feature points, Feature vectors are then computed for the feature points. How to get feature points with Gabor filter, it is necessary to explain how Gabor Filter acquires feature points and need a separate article to tell, see here.

With the feature points, the article mentioned in the gradient map, the stroke width chart, and the stroke distribution location Figure three map, according to these characteristics points in the corresponding values in these three images, under different templates, the calculation of the eigenvector.

Specific as follows:

1. Partition the picture first, because the number of feature points of different location slices is not the same, so in order to enrich the eigenvector, we can divide a picture into 10 different partitioning methods. A template corresponding to a partitioning method, under the template pi, divided into different regions x1,x2,x3 ...

2. For all the feature points of the area Xij under the template pi, the gradient values in the gradient graph are calculated, the gradient amplitude and the gradient direction are calculated respectively, and the sum is summed and averaged, and the sums (pi gradient amplitude), mean (pi gradient amplitude), sum (pi gradient direction), mean ( Pi gradient direction), according to the size of the area weighted, get a total of 4*10=40 values

3. For all the feature points of the area Xij under the template pi, calculate the width of the strokes in the stroke width chart, sum it up, average it, get sum (pi stroke width), mean (pi stroke width), and then get a total of 2*10=20 values after weighting according to the area size. In all regions, the stroke width variance of the feature points is deta, and the consistency of the stroke width is expressed by Deta/mean, the smaller the consistency is, the greater the number of equilibrium values, the Deta expands to 20 dimensions, namely 2*10+20=40;

4. Calculate foreground pixel and total pixel ratio ratio (pi) for all pixels in the XI region, a total of 18 regions;

Finally, a total of 98-dimensional vectors are used as input eigenvectors for the input of the SVM. So we can get the classifier that we want, classify the sentence fragment, classify it as the text, we think it is the final statement.

At this point, this article concludes.

Summarize

The article generally uses the text strokes, the attachment surface color double consistency, and uses the stroke text geometry characteristic (size area horizontal alignment and so on), also uses the correlation between the text, finally uses the Gabor filter to guarantee the stroke characteristic, The learning process of SVM classifier is used to ensure the robustness of the algorithm. The use of the EM algorithm in the GMM model, the Gabor filter to detect the texture features, are worthy of our learning.

Text Detection Learning notes--boundary clustering, stroke segmentation, statement paragraph classification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.