Stanford Machine Learning Open Course Notes (15th)-[application] photo OCR technology

Source: Internet
Author: User

Public Course address:Https://class.coursera.org/ml-003/class/index 
INSTRUCTOR:Andrew Ng

1. Problem description and pipeline ( Problem description and pipeline )

OCRYesOptical Character RecognitionOptical character recognition. In the photoOCRThe technology is actually to identify the characters in the photo, such as the figure below:


The process involves the following steps:

The first is to detect the text, find the area where the text appears, divide the text into single letters, and finally identify the letters, the additional function is to splice recognized letters and check and correct the spelling of the words. You can also use the pipeline:

2. sliding windows ( Sliding Window )

A sliding window is a technique used to extract a specific object from an image. The text above uses the Sliding Window Technique. Here we use another example to detect pedestrians in the image:

To detect travelers, we first need to train a model to determine what pedestrians are. Therefore, we can put pictures of marked pedestrians and pictures of non-Pedestrians together for training:

After training the model, scan the image using a sliding window to determine whether pedestrians are included in each scan area:

During scanning, You need to specify the window size first. Generally, You need to specify a small window first, scan it once on the graph, and then scan it for a larger window. In the end, we can scan all pedestrians:

Return to the earliest Proposed Text Detection. After scanning through a sliding window, the results are as follows:

In the figure, the white area refers to the area with text. After finding these areas, the text in the area is separated. At this time, the sliding window method is still used, except that letters are no longer regarded as positive samples, instead, the split area is regarded as a positive sample:

3. Getting lots of data: artificial data synthesis ( Obtain a large amount of data - Artificial DATA SYNTHESIS )

In the previous lecture, we know that if the deviation value of the model is small, the more training samples, the better the model effect. However, sometimes obtaining data is not that simple, especially a large amount of data. If there is not enough data for training, we need to manually create some data. For example, we can distort the original normal image to get new data, it can increase background noise:

Note that adding meaningless noise and transformations does not help increase data:

The following are some discussions about how to obtain data. For example, you must ask yourself how to obtain data at a dead end when the deviation is small.10Double data may suddenly become open:


4. Ceiling analysis: What part of the pipeline to work on next ( Upper Limit Analysis )

It can be said that the process of machine learning is composed of many parts like a pipeline. Obviously, each part has certain errors. To improve accuracy, we need to know which part should be further improved, in this case, the upper limit analysis method is used. This method assumes that a part of the pipeline is in perfect state, so that the output data will be perfect data, the next part obtains perfect data without errors and calculates the accuracy of the entire system at this time:

As shown in, text recognition consists of four parts. Now we can find the system accuracy after optimization for each part. The question is, how can we improve the accuracy of the entire system? We can see from the table that, if we have optimized the text moderation part, the accuracy will be72%Add89%If we optimize the character segmentation, the accuracy is only from89%To90%If character recognition is optimized90%To100%In contrast, optimizing Text Detection maximizes the performance, followed by character recognition. The result of this analysis is that the text detection and character recognition should be optimized. Here is an example of face recognition:


We can see that optimized preprocessing(Remove background)The highest accuracy rate is increased.

-------------------------------------Weak split line---------------------------------------

This lecture introduces the application of machine learning in daily life.-OCRThis technology is now very common. Many systems that identify vehicle licenses at intersections use this technology. As the last lecture on the ten-week course of machine learning, I am glad to see that this is back to the specific application layer. I think this is the charm of machine learning and I look forward to a better future of machine learning. Thanks Andrew.
Professor Ng, thank Coursera.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.