One machine learning algorithm per day-machine learning practices

Source: Internet
Author: User

Knowing an algorithm and using an algorithm are two different things.

 

What should I do if I find that the model has a big error after you train the data?

1) Obtain more data. It may be useful.

2) reduce feature dimensions. You can manually select one or use mathematical methods such as PCA.

3) Obtain more features. Of course, this method is time-consuming and not necessarily useful.

4) add polynomial features. Are you trying to save your life?

5) Build your own, new, and better features. A little risky.

6) Adjust the regularization parameter lambuda.

The above method is a bit of luck. If it is not good, it is a waste of time.

Machine Learning diagonostic. Machine Learning diagnostics. Check correctness, improve efficiency, and save debugging time.

I. Evaluation hypothesis

The smaller the loss, the better the model, and the possibility of over-fitting.

The correct method is to split the data into a training set and a test set. Test Set error is obtained by testing the training data of the training set)

 

2. Model Selection and Training Verification Test Set

How to Select regularization parameters and polynomial times (Model Selection)

Try different regularization parameters and polynomial times and select the model with the minimum loss in the test set. This seems feasible, but it is all for Test Set Computing and cannot verify the generalization ability.

The solution is to divide three sets: training set, verification set, and test set.

Use the verification set to select the best parameter model, and then calculate the generalization loss on the test set.

 

Iii. Model Diagnosis: bias vs variance

Method for judging overfitting and underfitting

Draw a curve

If D is too small, it may be underfitting.

If D is too large, it may be overfitting.

For underfitting, the loss of the verification set and training set are not very large.

For overfitting, the loss of the training set is small, while the loss of the verification set is large.

 

Iv. regularization parameter balance of underfitting overfitting

If lambuda is large, it is easy to underfit, and if it is too small, it is easy to overfit.

How to choose?

Set the selection range of a regularization parameter, calculate the loss size corresponding to each value in the verification set, and select the smallest one.

 

5. Learning Curve

High Bias:

JCV and jtrain are both very high when m is large.

In this case, increasing the number of samples does not work. Because the model itself has a problem. The possible problem is that the model is too simple.

 

High variance:

The interval between JCV and jtrain is large.

In this case, increasing the number of training samples may be very effective.

 

Vi. Summary

1) get more samples: Solve the overfitting problem. If not, no.

2) Smaller feature sets: Same as above.

3) add other features to solve underfitting

4) Add a polynomial to solve underfitting

5) lambuda reduction: solves underfitting

6) Increasing lambuda: solves overfitting

One machine learning algorithm per day-machine learning practices

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.