Thesis reading (Xiang bai--"TIP2014" A Unified Framework for multi-oriented Text Detection and recognition)

Last Update:2016-12-12 Source: Internet

Author: User

Tags character classes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Xiang bai--"TIP2014" A Unified Framework for multi-oriented Text Detection and recognition

Directory

Author and RELATED LINKS
Method Summary
Innovation points and contributions
Method details
Experimental results
Question Discussion
Summary and Harvest Point
Reference documents

Author and RELATED LINKS
- Author

- Paper download
- Baixiang homepage, Liu
Method Summary
- Method Brief
  - This article is an extension of the method of author CVPR2012 ( ref. 1, which specializes in testing, can look at my previous blog), and this article does an end-to-end problem (detection + recognition).
  - The framework used is the traditional method-using SWT to detect candidate character areas, character-level classifier (random forest) to filter non-character noise, and then merge the characters into a string, and then cut into words (the combined segmentation algorithm uses reference 2).
  - This article improves the local focus of the main three points, first, the transformation of the random forest, through "feature and classifier sharing" makes the identification and detection of the same characteristics and the classifier (the same tree); second, the character recognition is based on the dictionary search error correction Method (dictionary created by the search order of the Bing search engine); Thirdly, the text is considered in various directions (inverted, portrait, right-to-left text).
- Simple framework of the method

Innovation points and contributions
- Contribution
  - Solve the problem of text recognition in any direction (curve, portrait, upside-down, right-to-left text)
  - It is proved that detection and recognition can use the same characteristics and classifiers
  - The error correction method based on dictionary search is used in character recognition
  - The new database hust-tr400
  - A complete end-to-end text recognition algorithm is proposed.
- The starting point for "feature and classifier sharing"
  - The former "feature sharing" is mostly used in different categories (Multi-class classification problem), this article migrates it, used at different levels of the task. The second kind of problem, uses the coarse level characteristic, but the multi-class question, uses is more fine leve characteristic. These two tasks are "feature sharing" (the intrinsic character of the text is constant, whether it is used for the two-class or multi-class classification problem)
  - A node branch of a random forest tree has a function similar to "clustering" that places similar characters on the same node, for example, "I,j,l" may fall on the same positive node , so the probability distribution of different positive node characters is not the same, that is, Each node comes with a feature similar to "character recognition" (estimated by histogram statistics of all sample characters lable on that node), so detection and identification can be shared with the classifier (e.g.)

Fig. 3. Illustration of character distribution histograms. Since The trees is exhaustively grown and each leaf node is either positive (red) or negative (blue).

For each positive leaf node, a character distribution histogram are computed using the examples falling into it and stored For the future use.

- Modification points for CVPR2012 (ref. 1)
  - Modified: Class Two rf→ two class rf+ multi-class RF
  - Extension: Character recognition method of error correction based on dictionary search + multi-direction
Method details
- Modification of random forest classifier
  - basic idea: when training RF, only use two kinds of label to make achievements. At the time of recognition, the label (class 62) of each leaf node is judged by the label distribution of the sample that falls on the node (in fact, it is useless to use multi-class label training!!!). )
  - The meaning of each symbol

- Error correction method for character recognition
  - Corrective Motives : Some of the characters themselves are particularly like (' I ' and ' l '), or depending on the character classifier are inseparable (' s ' and ' s ', ' C ' and ' C '), as shown, you need to contact the context (whether the word is formed), to rectify

- - - Similarity Measurement : Consider the editing distance ( Ref. 3) and the sorting in the dictionary
    - consider multi-directional : The first character must be in order, either from left to right or from right to left. Second, consider the time, the two direction (left to right, and right to left) to take into account, choose a higher similarity of the direction as the final word formation direction O (L)

Fig. 5. Probabilities of character classes (only top choices is shown). The word in the image is "Wood".

Certain characters can be very confusing. For example, after rotation the letter ' d ' was very similar to ' P '.

- How to solve the ambiguity of the case
  - " all " should mean a larger capitalization ratio.
  - The definition of " proximity " is difficult to grasp, for example, ' g,f,d ', which may differ from the later ' Oor '.

- Training data
  - Positive Sample: Synthetic library, 100k, image source (Wang's method of synthesis, reference 4), in addition to random translation transformations, Gaussian noise and blur, also added in various directions of change
  - Negative samples: Real Natural scene Image Library, 30k, image source (no text images, 6 libraries, Berkeley segmentation Data Set and Benchmarks (BSDS500), Zurich Building image DATABASE5, Oxford buildings Dataset6, MIT-CBCL streetscenes Dataset7, Casia tampered Image Detection Evaluation Database ( CAISA TIDE) V2.08, and PASCAL VOC (DATASET9)

Experimental results
- Detection
  - ICDAR2011

- - msra-td500

- Character recognition
  - chars74k

- End-to-end
  - Icdar 2011

- - hust-tr400

Question Discussion
- Existing end-to-end approach issues, and important improvements in this approach
  - Modified: Class Two rf→ two class rf+ multi-class RF
  - Extension: Character recognition method of error correction based on dictionary search + multi-direction
- For upside down, vertical, various rotated text, how to ensure that the classifier does not filter out as noise? (The feature selection has a rotational invariance, the sample is added to the rotation?) ）
- In the weight of the "replace" of the editing distance, the author thinks that if the classifier gives the same scores, for example, ' l ' and ' V ' score are 0.3 points, then two characters Fu Yue similar, the weight of substitution should be smaller, but is it good to calculate similarity by scoring?
Summary and Harvest Point
- Baixiang teacher their group to do a bit of text I admire, is that they choose the point of view or solve the problem is related to the actual application requirements, simple two examples can be seen, 1. Everyone in the icdar2003/2011 library to brush the indicator, they proposed that the library text is mostly (near) level, the actual life of the text is in various directions, and then they began to build their own library, the multi-directional text detection problems become more and more trend; 2. The dictionary selection of this article is also very interesting, without the traditional dictionary, but by the search order of Bing search engine to build a dictionary, because the actual application of the text in the image is more often used in the daily life of the use of higher frequency, rather than according to a "complete" (the actual is not complete, many place names, The dictionary of names that are not included in the word searches for the matching words in alphabetical order, which also takes into account the actual frequency of use.
- The article mentions a lot of details, explaining that a problem should be done very carefully, think more, gradually optimize to do better. For example, when selecting a positive sample, the string is randomly sampled, not directly from the word in the dictionary, to prevent the addition of human priori information from affecting the character-some letters tend to be grouped together, for example, "EA" is more often present together than "ZJ". For example, because some artificial materials (bricks, windows) and vegetation (grass, leaves) can easily be used as a false test, so in the selection of negative samples as much as possible to add such samples.

Reference documents
1. C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," Proc. IEEE CVPR, June, pp. 1083–1090.
2. X. C. Yin, X. Yin, K. Huang, and H. Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Anal. Mach. Intell., vol. 5, pp. 970–983, May 2014.
3. Y. Li and B. Liu, "A normalized Levenshtein distance metric," IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, pp. 1091–1095, June. 2007.
4. K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in Proc. IEEE ICCV, Nov, pp. 145 7–1464.

Thesis reading (Xiang bai--"TIP2014" A Unified Framework for multi-oriented Text Detection and recognition)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More