a summary of natural scene text location
The text occupies a smaller range in the scene image, and there is a large range of non-text areas in the image. Therefore, scene image text positioning as an independent step has been more and more attention. This includes the location of the first CD and magazine cover text, the location of the license plate in the intelligent transportation system, the caption extraction in the video, and the location of the scene text in the complex background with less restrictive conditions. At the same time, the robustness of text location algorithm is more and more high, and the scope of application is more extensive. Text positioning can generally be divided into three kinds, based on the connected domain, learning-based and the combination of the two ways. The process of connecting domain is to extract the candidate text area first, then filter out some non-text regions by prior information, and construct the text words according to the relationship between the candidate text characters. The key to learning-based approach lies in two aspects: first, the use of different feature extraction methods such as textures, wavelets, strokes and so on. The second is the use of classifiers such as support vector machines (supported vectors MACHINE,SVM), adaboost and so on. The method of connecting domain and learning is generally used in the extraction phase, but the filtering stage is to realize the non-text filtering by training the sample learning classifier.
1. connected domain-based approach
The analysis of connected domain is carried out after the scene text image is binary, so the initial research focuses on the preprocessing, enhancement and two value of scene text image. The method based on connected domain seldom needs to operate on multi-scale, so the operation time is faster, but there is a lot of priori information to filter out the disadvantage of the text area.
2. Learning-based approach
Based on learning the general process is to first split the image into a window, extract the features of the image in the window, and then use a well-trained classifier to divide the window into text and non-text, and finally connect the text area into a line of text. Based on the learning method, the computational amount is large, it is generally processed on many scales, and the training data must be prepared to train the classifier.
3. connecting domains and learning ways to combine
This method is generally divided into two stages, phase one is to extract the candidate connected areas, this phase is generally used in the method of connected domain analysis. Phase two is the classification of the text area and the non-text area, which is usually implemented by means of classifiers. The combination of the two methods, although not many parameter settings, but still can not get rid of the training data limitations.
Second, use Mser to carry on the text area localization
The maximum extremum stability region is an affine feature region extraction algorithm proposed by Matas and others. The inner gray area of the extracted region is almost constant, but the contrast with the background is very strong, and the region can keep the shape unchanged at multiple thresholds. In general text, the gray scale changes are relatively small, and the text and background of gray-scale contrast is relatively large, in line with the maximum stability of the extreme value of the characteristics of the region, so the use of this feature can be extracted from the color cluster is not possible to obtain a partial connected domain.
The maximum extremum stability area first transforms the image into a grayscale image, then transforms the image into a series of two-valued images under a certain threshold value. As the brightness threshold increases or decreases, the region continues to appear, grow, and merge. The regional variation of two different thresholds can be considered stable without exceeding a certain threshold value. Mathematical definition of the maximum extremum stability region: Defines the image as a region-to-grayscale mapping, which satisfies the full-order structure. Defines the adjacency relationship between pixels. The region in the image can be defined as a connected subset of the image that satisfies the adjacency, that is, for any point, the following form
(3.6)
which The boundary defined is
(3.7)
For and, it is called the maximal value region, and the inverse is the minimum area. For a set of mutually nested extremum regions. If its area change rate
(3.8)
The local minimum value is obtained, and is called the maximum extremum stable region.
After the Extremum stability region is obtained, the stable region is assigned a value of 1 and the remaining region is assigned a value of 0 to obtain a two-valued template for the maximum stability region of the extremum. The connected domain analysis of two-valued template is made, and the candidate connected domain is obtained.
Third, the use of MATLAB detectmserfeature to achieve simple text positioning
1. Processing process
The first is to input an image, want to do the necessary preprocessing such as grayscale, and then extract the Mser region (here is directly using the MATLAB function Detectmserfeature), and then convert the resulting region into a two-value image (mainly using the coordinates of the obtained region information). In this paper, the Mser region binary image is analyzed by the connected domain, the area of some obviously non-conforming characters is filtered, and then the filtered image is closed. The closed operation is then filtered again, and the bounding box enclosing the text area is finally obtained. First look at a few groups of effects, the left is the original image, the right is positioned after the map, the Green Line drawn out of the area, is not very obvious. The diagram is relatively simple, indicating that this method is still relatively primary, there are more empirical thresholds.
2. Source code
Code is still more important, said so much, can run out of the code is the focus, but only support matlab2014 and above.
The entire main function
%%read Image [Filename,pathname]=uigetfile ('*.*','Choose a picture');p Ath=[Pathname filename];colorimage=imread (path); Figure;imshow (colorimage);%%mser Zone Extraction grayimage=Rgb2gray (colorimage); Mserregions=detectmserfeatures (grayimage); Mserregionspixels=Vertcat (Cell2mat (mserregions.pixellist));%%The number of coordinate systems in the Mser area is removed, and the corresponding coefficients are assigned to true values. Remove the Mser area. Msermask=false(Size (grayimage)); IND= Sub2ind (Size (Msermask), Mserregionspixels (:,2), Mserregionspixels (:,1) ); Msermask (Ind)=true; figure;imshow (msermask);%%Coarse filter Removal [p_image,cwidth]=concomp_analysis (Msermask); Figure;imshow (colorimage); WI= Median (cwidth (:))/2; SE1=strel (' Line', WI,0);p _image_dilate=Imclose (P_IMAGE,SE1);%%Fine filter Removal [REC_WORD,IMG_COLOR,IMG_BW]=f_concomp_analysis (P_image_dilate,colorimage,p_image);
Among the concomp_analysis functions, the function returns a two-value image filtered out of non-text and a candidate for a closed operation. This function mainly analyzes the input msermask, and then filters out some non-text regions based on the size and aspect ratio of the connected domain. At the same time, in order to avoid too much threshold setting and adaptive, the width of each connected area is recorded in Cwidth.
function [P_image,cwidth] =concomp_analysis (BWIMG) [x, Y]=size (bwimg); Cwidth=[];whole=x*Y;conncomp= Bwconncomp (bwimg); %Find Connected Componentsthreefeature= Regionprops (Conncomp,' Area','BoundingBox','centroid'); Broder=[threefeature. boundingbox];%region area of [x y width height] character=[threefeature. area];%Area Area Centre=[Threefeature. Centroid];%% forI=1: conncomp.numobjects leftx=broder ((i-1) *4+1); Lefty=broder ((i-1) *4+2); Width=broder ((i-1) *4+3); Height=broder ((i-1) *4+4); Cenx=floor (centre (i-1) *2+1)); Ceny=floor (centre (i-1) *2+2)); ifArea (i) <80| | Area (i) >0.3*Whole bwimg (conncomp.pixelidxlist{i})=R; ElseIf width/height<0.1| | Width/height>2bwimg (Conncomp.pixelidxlist{i})=0; ElseCwidth=[Cwidth,width]; Rectangle ('Position', [Leftx,lefty,width,height],'Edgecolor','g'); Endendp_image=bwimg;
The f_concomp_analysis, the function is not detailed, and similar to the above. At the same time, color and grayscale images of text words are preserved, which is further prepared for recognition.
function [REC,SEG_IMG_COLOR,SEG_IMG_BW] =f_concomp_analysis (P_IMAGE,COLORIMG,P_IMG) [x, Y]=size (p_image); whole=x*y;j=1; Rec=[];conncomp= Bwconncomp (p_image); %Find Connected Componentsthreefeature= Regionprops (Conncomp,' Area','BoundingBox'); Broder=[threefeature. boundingbox];%region area of [x y width height] character=[threefeature. area];%Area Area%% forI=1: conncomp.numobjects leftx=floor (Broder (i-1) *4+1)); Lefty=floor (Broder (i-1) *4+2)); Width=broder ((i-1) *4+3); Height=broder ((i-1) *4+4); % Data=grayimg_reserve (lefty:lefty+height-1,leftx:leftx+width-1); % STDA (i,:) =statxture (data); ifArea (i) <500| | Area (i) >whole*0.4p_image (Conncomp.pixelidxlist{i})=R; ElseIf width/height<2p_image (Conncomp.pixelidxlist{i})=R; % ElseIf STDA (i,4) <0% P_image (conncomp.pixelidxlist{i}) =R; Elserect=[Leftx,lefty,width,height]; Rec=[Rec;rect]; Rectangle ('Position', [Leftx,lefty,width,height],'Edgecolor','g'); SEG_IMG_COLOR{J}=colorimg (lefty+1:lefty+height,leftx+1:leftx+width,:);% +1avoid an index of 0 seg_img_bw{j}=p_img (lefty+1:lefty+height,leftx+1:leftx+width); J=j+1; % Zone{1,j}.data=grayimg_reserve (lefty:lefty+height-1,leftx:leftx+width-1); % zone{1,j}.location=[Leftx,lefty,width,height]; % zone{1,j}.label=J; % j=j+1; Endendpp_image=p_image;
MATLAB implements Mser (maximum extremum stability region) for text positioning