Stanford CS229 Machine Learning course note seven: Algorithm diagnostics, error analysis, and how to start a machine learning problem

Source: Internet
Author: User

This section is Andrew's advice on applying machine learning, although there is no mathematical formula, but it is a very important lesson.

Debugging Learning Algorithms

Suppose to make a model of spam categorization, a small subset of words (100 words) has been selected from a large vocabulary to feature. Bayesian logistic regression was realized by the gradient ascending algorithm, but the error rate of the test set reached 20%, which was obviously too high.

How to solve this problem?

    • Collect more training samples
    • Further reduce the number of features
    • Increase the number of features
    • Change characteristics (Consider message header/body)
    • Move the gradient up more than run several iterations
    • Try Newton's method
    • Use a different lambda
    • Use SVM instead

"The people in industry and in search that I see that is really good, would not go and try to change a learning Algorith M randomly. " This is Andrew's exact words, there are so many solutions, if we just randomly choose, it is a waste of time and feel like a chance. A better approach would be to first diagnose where the problem is and then choose the right solution.

1.high Variance vs High bias

Typical learning curve for high variance:

    • As the sample size increases, the error rate of the test set will continue to decrease (increasing the sample size will contribute to improved performance)
    • The error rate of the training set is far from the error rate of the test set

Typical learning curve for high bias:

    • The error rate of the training set is also unacceptable.
    • There is little difference between training set and test set error rate

Therefore, in the solutions listed above:

    • Collect more training samples-to solve the high variance problem
    • Further reduce the number of features--resolves the high variance problem
    • Increase the number of features--to solve the high bias problem
    • Change feature (Consider message header/body)--resolves high bias problem
2.optimization Algorithm vs Optimization objective

Or just an example of spam categorization, suppose Bayesian logistic regression has a 2% error rate for spam classification, and 2% error rate for non-spam classification (we don't want to see too many normal messages being filtered), while SVM uses linear kernel the error rate in spam classification is 10%, The error rate for non-spam classification is only 0.01%. But considering the efficiency of the calculation, you still want to use logistic regression, how should you tune it? At this point we are concerned about two issues:

    1. is the gradient ascending algorithm of logistic regression convergent?
    2. Are we optimizing the correct function?

In this problem, the function we care about is the weighted accuracy rate (the weight of non-spam should be higher than the spam weight):

The corresponding strategy functions of Bayesian logistic regression and SVM need to consider whether the appropriate parameters are chosen:

By the background of the problem, we already have a (ΘSVM) > A (ΘBLR), then we need to diagnose: j (ΘSVM) > J (ΘBLR)? If J (ΘSVM) > J (ΘBLR), this shows that ΘBLR is not able to maximize J (θ), that is, the algorithm does not converge, need to improve the optimization algorithm, if J (ΘSVM) ≤j (ΘBLR), which indicates that J (θ) is the wrong optimization target, because even if J (θ) has been maximized and the objective function we care about has not been maximized, and the objective function needs to be improved. Therefore, in the solutions listed above:

    • Increase the gradient by running several iterations--to solve the optimization algorithm problem
    • Try Newton Method--solve the problem of optimization algorithm
    • Use different λ--to solve the problem of optimization goal
    • Use svm--to solve optimization goal problems

The two problems described above can be seen, if not diagnosed clearly the root cause of the random tuning, it will likely lead to a half-day but no improvement. In addition, it is often necessary to present their own diagnostic methods to determine what problems arise in the algorithm. "Solving a really important problem using learning algorithms, one of the most valuable things is just your own personal I Ntuitive understanding of problem. "The diagnosis determines that the problem in machine learning applications is a good way to get a" gut understanding of the problem. "

Error Analysis

Suppose there is a machine learning application for face recognition, which consists of a number of different or components.

The accuracy of the current system is 85%, and the error analysis method can be used to determine which component is improved to maximize overall accuracy:

The specific approach is to replace a machine-learning component with manual or other means, recording the system accuracy that is obtained after the replacement. From the table above we can see that lifting the face detection will significantly improve the system accuracy (should be the focus of our next work), while preprocess (remove Backgroung) can only trace the overall accuracy.

Ablative analysis

Suppose, by adding some features to the logistic regression:

    • Spelling correction
    • Sender Host Features
    • Email Header Features
    • Email Text parser Features
    • Javascript Parser
    • Features from embedded images

We raised the spam classifier with a accuracy rate of only 94% to 99.9%. At this point, we would like to know how much the overall accuracy of each component is improved? Erode analysis by removing a component from the system each time, see how much accuracy drops to answer this question:

From the top, we can see the email text parser features is the most helpful for accuracy improvement, and if you consider removing some components to improve efficiency, this step is the least that should be removed.

Getting started on a learning problem

Approach #1: Careful design

    • Spend a long term designing exactly the right features, collecting the right dataset, and designing the right algorithmic Architecture.
    • Implement it and hope it works.

Benefit: Nicer, perhaps more scalable algorithms. May come up with new, elegant, learning algorithms; Contribute to basic learning.

Approach #2: Build-and-fix

    • Implement something quick-and-dirty.
    • Run error analyses and diagnostics to see what's wrong with it, and fix its errors.

Benefit: Would often get your application problem working more quickly. Faster time to market.

The first method is suitable for doing theoretical research. In the daily work we should use the second approach, avoid premature optimization (to learn a lot of knowledge that may not be used, spend a lot of energy in a place that can only raise a little bit of revenue), improve work efficiency! In the end, Andrew says he often spends 3 to 1 or more on the design of diagnostic methods to find out where the work is going, where there is a problem, and it is worthwhile to spend this part of the time (well-spent).

Summarize

From the Chinese New Year period (the first note was posted on February 24) to Qingming Holiday (today April 5), after 40 days or so the time has finally CS229 supervised learning part of the past. During this time, I have solved countless problems in the field of machine learning, and I feel the whole person is refreshing. Thanks to NetEase's open translation, and carefully provided the download of the handout, of course, thanks to the wonderful teaching of Andrew God. The next step is to master a good toolbox (perhaps mahout?). And then start practicing implement something quick-and-dirty, and then optimize the key links by error analyses. Fight like a ML expert!

Stanford CS229 Machine Learning course note seven: Algorithm diagnostics, error analysis, and how to start a machine learning problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.