Paper Reading (baixiang--"CVPR2016" multi-oriented Text Detection with Fully convolutional Networks)

Source: Internet
Author: User

baixiang--"CVPR2016" multi-oriented Text Detection with Fully convolutional Networks

Directory
    • Author and RELATED LINKS
    • Method Summary
    • Method details
    • Innovation points and contributions
    • Experimental results
    • Question Discussion
    • Summary and Harvest Point

    • Author and RELATED LINKS

      • Paper download
    • Method Summary
      • Step text block detection: First Use text-block FCN to get salient map, and then the salient map connected component analysis to get text block;
      • Step 2--Text line formation: the Text-block uses Mser to extract candidate character regions, using candidate character areas to estimate the direction of the entire block, and then combining the candidate characters of the bounding box to generate each line of text;
      • Step 3--Text line filtering: using centroid FCN to get the centroid of the characters in each line of text, using centroid to filter non-text lines;

The following is a sample diagram of each step of refinement:

Figure 1. The procedure of the proposed method. (a) an input image; (b) The salient map of the text regions predicted by the TextBlock FCN; (c) Text block generation; (d) candidate character component extraction; (e) Orientation estimation by component projection; (f) Text line candidates extraction; (g) The detection results of the proposed method.

    • Method details
      • Text Blcok detection
        • Network structure of Text-block FCN

Select the first 5 layers of the VGG16 to remove the back of the full connection layer. After each convolutional layer, a deconv operation (consisting of the convolution + upper sampling of the 1*1) is connected. The maps obtained by the 5 Deconv are then fusion with the 1*1 convolution, and a salient map is obtained through a sigmoid layer.

  

        • Text-block FCN Generating a salient map example

With less detail from layers 1th to 5th, global information is becoming more and more powerful.

        • Salient map obtained by the common method and salient map comparison by Text-block FCN

        • The training map used by the text block FCN

      • Text line generation
        • Mser extracting character candidate regions
          1. Mser extraction in each text block (does not require mser to extract all characters, allowing missing and noisy);
          2. The area, aspect ratio of the candidate area is used to filter the most noise (only in the text block, the noise is not much, and relatively single);
        • Projection method Estimating line direction
          1. Find the optimal h and θ in the text block (a straight line can be determined), so that the line passes through the component number of the most;
          2. The assumptions of this method are as follows: first, in the same block, all text lines are in the same direction; second, the text lines are nearly straight lines;

      • candidate Text line generation
        1. Clustering of all component in each block, clustering conditions are as follows:
          •  

        2.   Each grouping generates bounding box
          • draws a line along the direction of Block (Alpha) θr (α) L, the line passes through the center of all the component in this grouping (all the component centers take one or the middle one?).
          • Sets the intersection of the line L and the edge point of α as P (α is the set of all the white dots in the figure (b)), i.e. the leftmost and rightmost two red dots.
            •  

            •  < /p>

          • generates an entire text line bounding box (set)

               

    • Text line Noise filtering
      • Character-centroid FCN obtains the centroid of all possible characters in each text line
        • Model structure: 5-ply convolution from Text-block FCN → 3-ply convolution (Text-block FCN's smaller version)
        • Training Sample: Distance character Center (character Center in Ground truth?) A point that is less than 15% of the height of the character counts as a positive sample point, and the other points count as negative sample points,
      • Non-text line noise filtering
        • Mean filtering of centroid probability
        • Geometric angle filter (near straight line)

    • Innovation points and contributions
      • Contribution
        1. Use FCN to generate text/non-text salient map;
        2. Generate text lines using information from local (Component based) and global (context of the text block);
        3. Using FCN to get the centroid of character;
      • The starting point of idea
        1. Since FCN can be calibrated at pixel level, it can be used to get the probability that each pixel is a literal (salient map), and it can be used to get the probability that each pixel is a character Fu Marcen (centroid map);
        2. A single character is susceptible to background interference, is prone to missing or false detection, and text blocks are not only more differentiated than characters (easier to separate from the background) and more stable (generally more complete). Therefore, if the information of a single character (local, detail) and contextual information (text line) can be combined, it can make the detection more robust;

    • Experimental results
      • msra-td500
      • ICDAR2015
      • ICDAR2013
      • Example diagram
      • Diagram of the failure

    • Question Discussion
      • Why FCN can be used to make text? What are the benefits?
        • FCN can fuse local (character) and global information (text lines);
        • FCN is end-to-end training;
        • FCN the pixel-level calibration is very fast after removing the fully connected layer, and the text itself is easy to get the training data of pixel-level calibration;
      • Why this article method can solve the multi-directional text detection problem?
        • The method of determining the text direction in the article is that the text block (recorded as Alpha) uses Mser to detect candidate characters, and then finds a line θr (α) on the entire block, which makes the number of candidate characters most. Since the two parameters H and θ of the linear θr (α) have no restriction, the text can be detected in any direction;

      • What is the convolution effect of 1*1 in Deconv?
        • 1*1 's convolution gets the same map size as the original, so there are generally three functions. One is to integrate the multi-channel information, and the other is to increase or decrease the dimensions (the number of channels in the previous layer is more or less), and the third is that sometimes 1*1 convolution is done pixel-wise prediction.
      • What is the function of the Deconv sampling?
        • Because each layer of convolution gets a different map size and is smaller than the original, a map of the same size as the original will be sampled when fusion takes place. This on-sample kernel (parameter) can be learned in FCN, and the bilinear interpolation can be chosen when initializing;
      • The resulting text block why not directly as a text line, why do you have to do a separate step to create text line?
        • First, when more than one text line is close, it is easy to be contained by a block; second, the text block gets a range that is too coarse and has no exact text position;
      • How does this projection algorithm for text line direction estimation actually work? (traversing h and θ?) )
      • The centroid graph obtained in text line filtering is a probability plot, as shown in a white area instead of the only one, and each character is ultimately represented by a centroid? How to determine this unique centroid? (Use similar meanshift,nms to find extreme points?) )

    • Summary and Harvest Point
      • The idea of combining the features of text block (global) and character (local) is very good, and it is important to combine faster rcnn with the component method to refer to some of the ideas in this article.
      • This is the first to use FCN to do the text of the article, although the accuracy rate is high, but pixel wise is still relatively slow. Text and other target detection is not the same, too thin. You need to find another way to detect the text block.

Paper Reading (baixiang--"CVPR2016" multi-oriented Text Detection with Fully convolutional Networks)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.