Stanford CS229 Machine Learning course note seven: Algorithm diagnostics, error analysis, and how to start a machine learning problem

Last Update:2015-08-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section is Andrew's advice on applying machine learning, although there is no mathematical formula, but it is a very important lesson.

Debugging Learning Algorithms

Suppose to make a model of spam categorization, a small subset of words (100 words) has been selected from a large vocabulary to feature. Bayesian logistic regression was realized by the gradient ascending algorithm, but the error rate of the test set reached 20%, which was obviously too high.

How to solve this problem?

Collect more training samples
Further reduce the number of features
Increase the number of features
Change characteristics (Consider message header/body)
Move the gradient up more than run several iterations
Try Newton's method
Use a different lambda
Use SVM instead

"The people in industry and in search that I see that is really good, would not go and try to change a learning Algorith M randomly. " This is Andrew's exact words, there are so many solutions, if we just randomly choose, it is a waste of time and feel like a chance. A better approach would be to first diagnose where the problem is and then choose the right solution.

1.high Variance vs High bias

Typical learning curve for high variance:

As the sample size increases, the error rate of the test set will continue to decrease (increasing the sample size will contribute to improved performance)
The error rate of the training set is far from the error rate of the test set

Typical learning curve for high bias:

The error rate of the training set is also unacceptable.
There is little difference between training set and test set error rate

Therefore, in the solutions listed above:

Collect more training samples-to solve the high variance problem
Further reduce the number of features--resolves the high variance problem
Increase the number of features--to solve the high bias problem
Change feature (Consider message header/body)--resolves high bias problem

2.optimization Algorithm vs Optimization objective

Or just an example of spam categorization, suppose Bayesian logistic regression has a 2% error rate for spam classification, and 2% error rate for non-spam classification (we don't want to see too many normal messages being filtered), while SVM uses linear kernel the error rate in spam classification is 10%, The error rate for non-spam classification is only 0.01%. But considering the efficiency of the calculation, you still want to use logistic regression, how should you tune it? At this point we are concerned about two issues:

is the gradient ascending algorithm of logistic regression convergent?
Are we optimizing the correct function?

In this problem, the function we care about is the weighted accuracy rate (the weight of non-spam should be higher than the spam weight):

The corresponding strategy functions of Bayesian logistic regression and SVM need to consider whether the appropriate parameters are chosen:

By the background of the problem, we already have a (ΘSVM) > A (ΘBLR), then we need to diagnose: j (ΘSVM) > J (ΘBLR)? If J (ΘSVM) > J (ΘBLR), this shows that ΘBLR is not able to maximize J (θ), that is, the algorithm does not converge, need to improve the optimization algorithm, if J (ΘSVM) ≤j (ΘBLR), which indicates that J (θ) is the wrong optimization target, because even if J (θ) has been maximized and the objective function we care about has not been maximized, and the objective function needs to be improved. Therefore, in the solutions listed above:

Increase the gradient by running several iterations--to solve the optimization algorithm problem
Try Newton Method--solve the problem of optimization algorithm
Use different λ--to solve the problem of optimization goal
Use svm--to solve optimization goal problems

The two problems described above can be seen, if not diagnosed clearly the root cause of the random tuning, it will likely lead to a half-day but no improvement. In addition, it is often necessary to present their own diagnostic methods to determine what problems arise in the algorithm. "Solving a really important problem using learning algorithms, one of the most valuable things is just your own personal I Ntuitive understanding of problem. "The diagnosis determines that the problem in machine learning applications is a good way to get a" gut understanding of the problem. "

Error Analysis

Suppose there is a machine learning application for face recognition, which consists of a number of different or components.

The accuracy of the current system is 85%, and the error analysis method can be used to determine which component is improved to maximize overall accuracy:

The specific approach is to replace a machine-learning component with manual or other means, recording the system accuracy that is obtained after the replacement. From the table above we can see that lifting the face detection will significantly improve the system accuracy (should be the focus of our next work), while preprocess (remove Backgroung) can only trace the overall accuracy.

Ablative analysis

Suppose, by adding some features to the logistic regression:

Spelling correction
Sender Host Features
Email Header Features
Email Text parser Features
Javascript Parser
Features from embedded images

We raised the spam classifier with a accuracy rate of only 94% to 99.9%. At this point, we would like to know how much the overall accuracy of each component is improved? Erode analysis by removing a component from the system each time, see how much accuracy drops to answer this question:

From the top, we can see the email text parser features is the most helpful for accuracy improvement, and if you consider removing some components to improve efficiency, this step is the least that should be removed.

Getting started on a learning problem

Approach #1: Careful design

Spend a long term designing exactly the right features, collecting the right dataset, and designing the right algorithmic Architecture.
Implement it and hope it works.

Benefit: Nicer, perhaps more scalable algorithms. May come up with new, elegant, learning algorithms; Contribute to basic learning.

Approach #2: Build-and-fix

Implement something quick-and-dirty.
Run error analyses and diagnostics to see what's wrong with it, and fix its errors.

Benefit: Would often get your application problem working more quickly. Faster time to market.

The first method is suitable for doing theoretical research. In the daily work we should use the second approach, avoid premature optimization (to learn a lot of knowledge that may not be used, spend a lot of energy in a place that can only raise a little bit of revenue), improve work efficiency! In the end, Andrew says he often spends 3 to 1 or more on the design of diagnostic methods to find out where the work is going, where there is a problem, and it is worthwhile to spend this part of the time (well-spent).

Summarize

From the Chinese New Year period (the first note was posted on February 24) to Qingming Holiday (today April 5), after 40 days or so the time has finally CS229 supervised learning part of the past. During this time, I have solved countless problems in the field of machine learning, and I feel the whole person is refreshing. Thanks to NetEase's open translation, and carefully provided the download of the handout, of course, thanks to the wonderful teaching of Andrew God. The next step is to master a good toolbox (perhaps mahout?). And then start practicing implement something quick-and-dirty, and then optimize the key links by error analyses. Fight like a ML expert!

Stanford CS229 Machine Learning course note seven: Algorithm diagnostics, error analysis, and how to start a machine learning problem

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Stanford CS229 Machine Learning course note seven: Algorithm diagnostics, error analysis, and how to start a machine learning problem

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support