Ng Lesson 18th: Application Examples: Image text recognition (Application Example:photo OCR)

Source: Internet
Author: User

18.1 Problem Description and flowchart

18.2 Sliding windows

18.3 get large amounts of data and manual data

18.4 Upper Limit Analysis: Which part of the pipeline to do next

18.1 problem description and flowchart

Image text recognition The thing to do is to recognize the text from a given picture. This is more complex than identifying text from a scanned document.

In order to do this, you need to take the following steps :

1. Text detection (text detection)--separates the text on the picture from other environment objects

2. Word segmentation (Character segmentation)--Split the text into a single character

3. Character classification (Character classification)--to determine what each character is a task flowchart can be used to express this problem, each task can be resolved by a separate team:

18.2 sliding window

A sliding window is a technique used to extract objects from an image. If we need to identify pedestrians in a picture, the first thing to do is to train a model that accurately identifies pedestrians with a number of fixed-size images. Then, using the image size used to identify the pedestrian model before training, we cut the image of the pedestrian recognition, then give the cropped slice to the model, let the model determine whether it is pedestrian, and then slide the clipping area on the picture to re-crop it, and then give the new clipped slice to the model to judge. This loops until all the pictures have been detected.

Once this is done, we scale the cropped area proportionally, then crop the picture with a new size, shrink the newly trimmed slice proportionally to the size adopted by the model, and pass it on to the model to judge, so loop.

Sliding window technology is also used for word recognition, first training the model to distinguish between characters and non-characters, and then, using the sliding window technology to identify characters, once the character recognition, we will identify the area to do some expansion, and then merge the overlapping areas. Then use the aspect ratio as the filter condition to filter out areas with a greater height than the width (the length of the word is usually greater than the height). The medium green area is the area that is considered text after these steps, and the red area is ignored.

The above is the text detection phase.

The next step is to train a model to complete the task of dividing the text into characters, and the training set is trained by a picture of a single character and a picture of two connected characters to train the model.

After the model has been trained, we still use the sliding window technique for character recognition.

The above is the word segmentation phased.

The last stage is the character classification stage, which can be trained by using neural network, support vector machine or logistic regression algorithm.

18.3 get a lot of data and manual data

If our model is low-variance, then more data can be used to train the model, and it will have a better effect. The problem is, how to get the data, the data is not always available directly, we may need to manually create some data.

In the case of word recognition applications, we can download a variety of fonts from a font site, and then use these different fonts to create examples for training by using various random background images, which allows us to get an infinite set of training. This is the creation of an instance from scratch.

Another approach is to take advantage of existing data and then modify it, such as distorting, rotating, and blurring an existing character picture. As long as we think the actual data is likely to be similar to the data after such treatment, we can use this method to create a lot of data.

There are several ways to get more data:

1. Artificial data synthesis

2. Manually collect and tag data

3. Crowd pack

18.4 Upper Limit Analysis: Which part of the pipeline to do next

In machine learning applications, it usually takes several steps to make a final prediction, and how can we know which part is the most worthwhile for us to spend time and effort to improve? This question can be answered by the upper limit analysis.

Back to the word recognition application, our flowchart is as follows:

The output of each part of the flowchart is the input of the next part, in the upper limit analysis, we select part of the step, manually provide 100% correct output results, and then see the overall effect of the application to improve how much. In our case, the overall effect is 72% of the correct rate.

If we make the text detection part of the output 100% correct, we found that the overall effect of the system increased from 72% to 89%. This means we will probably want to devote time and effort to improving our text detection section.

Then we manually select the data, so that the word segmentation output 100% correct, found that the overall effect of the system only increased by 1%, which means that our word segmentation part may be good enough.

Finally, we manually select the data, so that the result of the character classification output 100% is correct, the overall effect of the system increased by 10%, which means that we may also invest more time and effort to improve the overall performance of the application.

Ng Lesson 18th: Application Examples: Image text recognition (Application Example:photo OCR)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.