ICDAR2015 Robust Reading TASK4 entry impressions

Last Update:2015-04-11 Source: Internet

Author: User

Tags scale image

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Robust Reading's Task 4 was launched this year, compared with the Task1 and Task2 that have been in the public for many years, the TASK4 sample is a live scene randomly photographed using a mobile phone, not deliberately targeting text.

Sample image of Task1

Sample image of Task2

Sample image of Task4

Competition requirements are only to detect the human eye can see the text, more intuitive, do not do too much to repeat.

Here are some of my thoughts.

Code section

1. Recommend some efficient development languages

For some tools, it is recommended to write in C #, Java, MATLAB and other interpreted languages without regard to performance. C #, Java in the development of interactive tools is very efficient, many classes are readily available, especially C # IntelliSense list is currently the best user experience, and MATLAB script can be done without building a project to achieve some small functions, such as changing the file name, suffix name, move, copy files and so on.

2. Some more useful matlab function Introduction, the specific use does not do too much introduction, everybody own Baidu's Bar

Dir () gets all files under the folder or files in the specified format

RegExp (Str_line, ' \s+ ', ' split ') divides str_line into spaces, and C # has the same function, which is the split function under the String class

RegExp (Str_line, ', ', ' split ') splits the str_line with commas

Inpolygon () A function that determines whether a point is within a polygon

Cp2tform () Perspective transform function

fp = fopen (' test.txt ', ' r ', ' N ', ' utf-8 '); Open file in utf-8 format, file header will have one? UTF-8-Compliant encoding method

MoveFile () Moving files

CopyFile () Copying files

CNN Section

CNN is characterized by the feature-by-layer extraction, as the depth increases, the higher the feature level of extraction, the shallow layer is generally considered to extract edge features or color characteristics.

I will be the first pooling layer before the convolution are considered shallow, so that CNN can be based on the shallow core size into two categories, one is Kernel size=5, another class kernel size=3. The former is very common in the Caffe sample, from a visual perspective, the shallow 5x5 can express richer edge information, and the latter small kernel is only seen in Ben Graham's paper [1], and Ben Graham has a very regular network structure design , as can be seen from the structure of Ben Graham, he transformed the structure into a large number of filters, the entire structure is more like a coding, and the 3x3 kernel is not a shallow feature extraction, because the 3x3 kernel can embody the edge information is too limited. So, according to the size of the shallow convolution nuclei, I have roughly divided CNN into two categories, one is the feature-by-layer CNN, the other is a layer-coded CNN, but this is only an understanding, there is no relevant experimental support.

From the experimental process, Ben Graham's structure is very small and exquisite, training faster, for small-scale image classification problem is very effective.

Some small details such as leaky Relu instead of the original relu as a new neuron excitation function, from the comparative experiment to see leaky relu better than Relu, but also not absolute, can be said to be a good choice for network details adjustment.

Sample section

The design of the classifier structure is not particularly important for classifier training because the paper is there, the open source code is there, everyone will try it, and the final structure is similar. And what kind of data to use to train is more important.

In fact, too much reliance on data is the fault of the classifier, but the current discriminant model and which do not rely on data?

Other

1. There is a clear system framework, it must be clear what to do after doing what to do, this game a big mistake is not very early recognition of the importance of the extraction of connected parts, initially did not pull the connected parts as one of the work, the result is very passive back.

2. Try to finish the system as early as possible and then pursue the details. This problem is not unaware, but the luck of the psychological in the end, has been to think in case this improvement will be the classifier performance changes, you can save a lot of things back, the result is no such good luck. But it's not totally out of the question, doing so many experiments, that one of CNN's deadly problems is that there is a high demand for the number and quality of samples, and CNN is not as good as it seems. So in the depth of learning this direction, blindly pursuit of data set on the result is not a long-term strategy, but to follow the rhythm of others, to solve some of the more valuable problems in deep learning is a longer-term choice.

3. A good mindset and a suitable sprint time. Do have deadline things must not worry, no matter how big the problem, the first thing to think is how to use the most efficient way to solve, how much time and labor costs, such a game if there is not good enough accumulation in the last few days, must stay up late, the early can not stay up, Be sure to bring the spirit back at all times to stay up late.

4. Study the complementarity of the different modules of the system to make more effort. such as the training of CNN model, there must be "incurable diseases", but it is likely that these samples with Mser is very good solution, both are not easy to solve, may be used in the text line aggregation easily, a good system can not step perfect, complementary will also get good results.

5. Time estimate for work. For example, 100W of large patch samples to the server for training, if the network transmission speed of 10m/s,100w sample is almost 25G, to the server to 43min, if you want to do this sample disturbance, at least to transmit 100G of data, It takes 2.9 hours and then turns into a leveldb format that Caffe can use for at least 1 hours, the whole process down, a conservative estimate of 4 hours, a mistake in the middle, and maybe one day preparing the sample. And putting the work of the disturbance on the server side can greatly reduce the time.

6. Set the standard: including file name format, the data is binary or ANSI, it is recommended to use binary form of data files, can be file size to check the integrity of the file, but also easy to read and write.

7. The correct classification and the wrong classification of the sample to be viewed, to facilitate the analysis of the classifier defects, as well as improved methods.

ICDAR2015 Robust Reading TASK4 entry impressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More