"AAAI2017" textboxes:a Fast Text detector with A single Deep neural network

Source: Internet
Author: User

This article is reproduced from:

Http://www.cnblogs.com/lillylin/p/6204099.html


xiangbai--"AAAI2017" textboxes:a Fast Text detector with A/single Deep neural network

Catalog Authors and related link methods summarize innovation points and contribution methods summary of experimental results and harvesting points

author and related link author

Thesis downloads   Lio Minghui, Shi,  Baixiang,  Wang Xinggang   Liu to code download method Overview article core: Improved version of SSDs to address word detection pipeline: Step 1: Image input to Modified SSD Network + non-maximum suppression (NMS) → Output candidate detection results Step 2: Candidate test results  +& nbsp CRNN for word recognition →  new detection results  + recognition results performance of methods Multi-scale version-positioning: icdar2011-0.85 (f), icdar2013-0.85 (f), 0.73s/per Image single-scale version-positioning: icdar2011-0.80 (f), icdar2013-0.80 (f), 0.09s/per image where the improved SSD is: default Box's aspect ratio is modified (long shape) to make it more suitable for word detection (word) as a classifier convolution filter size from 3*3 to 1*5, more suitable for text detection SSD originally for many kinds of detection problem, now to single class detection problem from input image for single scale into multi-scale Use identification to adjust the results of the detection (text spotting) innovation Point and contribution Innovation Point Modify the SSD to make it suitable for word detection (SSD itself is not robust to small target recognition) Contribution presents an end-to-end, well-trained, very concise text detection framework (SSD itself is a single stage, unlike common methods that require multiple steps) to propose a complete end-to-end identification of text detection + recognition framework experimental methods The results are good, fast strong> method Details related background--text recognition task text detection text/Word recognition end to end text recognition = text + Identify text spotting: And text detection is different, you can use the Word with dictionary to identify the The line adjusts the detection result, finally is uses the result of the word detection to judge the related background--SSD SSD's network structure SSD's default box

Fig. 1:SSD Framework. (a) SSD only needs a input image and ground truth boxes to each object during training. In a convolutional fashion, we evaluate a small set (e.g. 4) of the default boxes of different aspect ratios at each location In several feature maps with different scales (e.g.8x8 and 4x4 in (b) and (c)). For each default box, we are predict both the shape offsets and the confidences for all object categories (C1; c2;; CP )). At training time, we have the default boxes to the ground truth boxes. For example, we have matched two default boxes with the cat and one with the dog, which are treated as positives and the R EST as negatives. The model loss is a weighted sum between localization loss (e.g. Smooth L1 [6]) and confidence loss (e.g. Softmax).Related background--crnnThe network structure of CRNNcomparison between textboxes and SSD network structureTEXTBOXES Network Structure SSD network structureoutput of Text-box layers (as with SSDs)

textboxes different modifications with SSD details Default box aspect ratio

(right) Figure 2:illustration of default boxes for a 4*4 grid. For better visualization, only a column of default boxes whose aspect ratios 1 and 5 are. The rest of the aspect ratios are 2,3,7 and which are the placed. The Black (aspect ratio:5) and blue (ar:1) default boxes are centered in their. The green (Ar:5) and red (ar:1) boxes have the same aspect the ratios and a vertical offset (half of the height of the cell) to the grid center respectively convolution filter size loss function

Multi-scale input

Textboxes+crnn for identification

The experimental results locate the text spotting and end to end recognition

Effect Show Summary and Harvest Point the original SSD is not directly used in the text, need to make a lot of changes to achieve better results, which the author in the experiment also proved that now more and more use faster R-cnn,ssd,yolo, Such generic target detection methods are modified to be used on specific target detection (e.g., text, pedestrian), these methods are not only fast, but also high robustness, very important point, more and more inclined to end-to-end training, this is because the single stage and the traditional step-wise method has many advantages , for example, the overall training is simple, no stage cohesion on the performance loss, there is no gradual accumulation of errors and so on;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.