Thesis reading (Xiang bai--"TIP2014" A Unified Framework for multi-oriented Text Detection and recognition)

Source: Internet
Author: User
Tags character classes

Xiang bai--"TIP2014" A Unified Framework for multi-oriented Text Detection and recognition

Directory
    • Author and RELATED LINKS
    • Method Summary
    • Innovation points and contributions
    • Method details
    • Experimental results
    • Question Discussion
    • Summary and Harvest Point
    • Reference documents

    • Author and RELATED LINKS
      • Author

      • Paper download
      • Baixiang homepage, Liu
    • Method Summary
      • Method Brief
        • This article is an extension of the method of author CVPR2012 ( ref. 1, which specializes in testing, can look at my previous blog), and this article does an end-to-end problem (detection + recognition).
        • The framework used is the traditional method-using SWT to detect candidate character areas, character-level classifier (random forest) to filter non-character noise, and then merge the characters into a string, and then cut into words (the combined segmentation algorithm uses reference 2).
        • This article improves the local focus of the main three points, first, the transformation of the random forest, through "feature and classifier sharing" makes the identification and detection of the same characteristics and the classifier (the same tree); second, the character recognition is based on the dictionary search error correction Method (dictionary created by the search order of the Bing search engine); Thirdly, the text is considered in various directions (inverted, portrait, right-to-left text).
      • Simple framework of the method

    • Innovation points and contributions
      • Contribution
        • Solve the problem of text recognition in any direction (curve, portrait, upside-down, right-to-left text)
        • It is proved that detection and recognition can use the same characteristics and classifiers
        • The error correction method based on dictionary search is used in character recognition
        • The new database hust-tr400
        • A complete end-to-end text recognition algorithm is proposed.
      • The starting point for "feature and classifier sharing"
        • The former "feature sharing" is mostly used in different categories (Multi-class classification problem), this article migrates it, used at different levels of the task. The second kind of problem, uses the coarse level characteristic, but the multi-class question, uses is more fine leve characteristic. These two tasks are "feature sharing" (the intrinsic character of the text is constant, whether it is used for the two-class or multi-class classification problem)
        • A node branch of a random forest tree has a function similar to "clustering" that places similar characters on the same node, for example, "I,j,l" may fall on the same positive node , so the probability distribution of different positive node characters is not the same, that is, Each node comes with a feature similar to "character recognition" (estimated by histogram statistics of all sample characters lable on that node), so detection and identification can be shared with the classifier (e.g.)

Fig. 3. Illustration of character distribution histograms. Since The trees is exhaustively grown and each leaf node is either positive (red) or negative (blue).

For each positive leaf node, a character distribution histogram are computed using the examples falling into it and stored For the future use.

      • Modification points for CVPR2012 (ref. 1)
        • Modified: Class Two rf→ two class rf+ multi-class RF
        • Extension: Character recognition method of error correction based on dictionary search + multi-direction
    • Method details
      • Modification of random forest classifier
        • basic idea: when training RF, only use two kinds of label to make achievements. At the time of recognition, the label (class 62) of each leaf node is judged by the label distribution of the sample that falls on the node (in fact, it is useless to use multi-class label training!!!). )
        • The meaning of each symbol

      • Error correction method for character recognition
        • Corrective Motives : Some of the characters themselves are particularly like (' I ' and ' l '), or depending on the character classifier are inseparable (' s ' and ' s ', ' C ' and ' C '), as shown, you need to contact the context (whether the word is formed), to rectify

        • adopted idea : give a dictionary, compare the identified results with each word in the dictionary, take the similarity maximum Word as the corrected recognition result
          • dictionary selection : instead of a traditional dictionary, use a dictionary based on the search order of the Bing search engine, Because the actual application of the text in the image is more often in the ordinary life of the use of higher frequency, rather than according to a "complete" (the actual is not complete, a lot of place names, names are not included) of the dictionary to search for matching words alphabetically, the actual use of the frequency also considered in the application will be stronger. In addition, this dictionary can be used on any library because of the versatility of the dictionary.
          • edit distance: Levenshtein edit distance (replace, delete , insert)
            • Replace the weights with insertions, deletions, and weights for different character swaps should also be different. The probability that θ is replaced with v depends on the probability that the sample x is determined by the classifier to be V and possibly the probability of θ. That is, the classifier to determine that a sample to be tested is ' l ' probability (0.3) is similar to the probability of ' j ' (0.28), and the probability of ' Z ' (0.01) is very different, so, ' l ' replaced by ' J ' The cost is smaller, replaced by ' Z ' is more expensive. The more similar the smaller the cost of sample substitution → The smaller the editing distance → The greater the similarity. ( The problem is, the more similar the classifier's scores for ' l ' and ' J ', the more similar the idea for ' l ' and ' j ', right? The possible case is ' l ' scored 0.1, ' K ' scored also 0.1, but ' l ' and ' k ' are actually not the same? )

          • Similarity Measurement : Consider the editing distance ( Ref. 3) and the sorting in the dictionary
          • consider multi-directional : The first character must be in order, either from left to right or from right to left. Second, consider the time, the two direction (left to right, and right to left) to take into account, choose a higher similarity of the direction as the final word formation direction O (L)

Fig. 5. Probabilities of character classes (only top choices is shown). The word in the image is "Wood".

Certain characters can be very confusing. For example, after rotation the letter ' d ' was very similar to ' P '.

      • How to solve the ambiguity of the case
        • " all " should mean a larger capitalization ratio.
        • The definition of " proximity " is difficult to grasp, for example, ' g,f,d ', which may differ from the later ' Oor '.

      • Training data
        • Positive Sample: Synthetic library, 100k, image source (Wang's method of synthesis, reference 4), in addition to random translation transformations, Gaussian noise and blur, also added in various directions of change
        • Negative samples: Real Natural scene Image Library, 30k, image source (no text images, 6 libraries, Berkeley segmentation Data Set and Benchmarks (BSDS500), Zurich Building image DATABASE5, Oxford buildings Dataset6, MIT-CBCL streetscenes Dataset7, Casia tampered Image Detection Evaluation Database ( CAISA TIDE) V2.08, and PASCAL VOC (DATASET9)

    • Experimental results
      • Detection
        • ICDAR2011

        • msra-td500

      • Character recognition
        • chars74k

      • End-to-end
        • Icdar 2011

        • hust-tr400

    • Question Discussion
      • Existing end-to-end approach issues, and important improvements in this approach
        • Modified: Class Two rf→ two class rf+ multi-class RF
        • Extension: Character recognition method of error correction based on dictionary search + multi-direction
      • For upside down, vertical, various rotated text, how to ensure that the classifier does not filter out as noise? (The feature selection has a rotational invariance, the sample is added to the rotation?) )
      • In the weight of the "replace" of the editing distance, the author thinks that if the classifier gives the same scores, for example, ' l ' and ' V ' score are 0.3 points, then two characters Fu Yue similar, the weight of substitution should be smaller, but is it good to calculate similarity by scoring?
    • Summary and Harvest Point
      • Baixiang teacher their group to do a bit of text I admire, is that they choose the point of view or solve the problem is related to the actual application requirements, simple two examples can be seen, 1. Everyone in the icdar2003/2011 library to brush the indicator, they proposed that the library text is mostly (near) level, the actual life of the text is in various directions, and then they began to build their own library, the multi-directional text detection problems become more and more trend; 2. The dictionary selection of this article is also very interesting, without the traditional dictionary, but by the search order of Bing search engine to build a dictionary, because the actual application of the text in the image is more often used in the daily life of the use of higher frequency, rather than according to a "complete" (the actual is not complete, many place names, The dictionary of names that are not included in the word searches for the matching words in alphabetical order, which also takes into account the actual frequency of use.
      • The article mentions a lot of details, explaining that a problem should be done very carefully, think more, gradually optimize to do better. For example, when selecting a positive sample, the string is randomly sampled, not directly from the word in the dictionary, to prevent the addition of human priori information from affecting the character-some letters tend to be grouped together, for example, "EA" is more often present together than "ZJ". For example, because some artificial materials (bricks, windows) and vegetation (grass, leaves) can easily be used as a false test, so in the selection of negative samples as much as possible to add such samples.

    • Reference documents
      1. C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," Proc. IEEE CVPR, June, pp. 1083–1090.
      2. X. C. Yin, X. Yin, K. Huang, and H. Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Anal. Mach. Intell., vol. 5, pp. 970–983, May 2014.
      3. Y. Li and B. Liu, "A normalized Levenshtein distance metric," IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, pp. 1091–1095, June. 2007.
      4. K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in Proc. IEEE ICCV, Nov, pp. 145 7–1464.

Thesis reading (Xiang bai--"TIP2014" A Unified Framework for multi-oriented Text Detection and recognition)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.