Stanford University Machine Learning notes-overfitting problems and regularization solutions

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When we use the linear regression and logistic regression described in the previous blog, there is often an over-fitting (over-fitting) problem. The next definition is fitted below:

overfitting (over-fitting):
The so-called overfitting is: if we have very many characteristics, the assumptions learned by using these features can be very well adapted to the training set (the cost function is very small, almost 0), but this assumption may not be extended to new data (the result of the new data prediction is not good, that is, we say that the generalization ability is not strong).
Here we see from the examples what is the concept of overfitting, under-fitting, and so on:

The above three groups are for the housing price prediction, for the first group to adopt a linear model, it is obvious that the effect of the fitting is not very good, the error is still very large, this phenomenon is called under-fitting (sometimes referred to as high deviation); The third group model uses four-square model, for training samples, the effect of fitting is very good The loss is almost zero, but this fitting is too focused on fitting the training data and forgetting the meaning of the training model, the prediction of new data, which is not good for new data predictions, which is called overfitting (high variance). Obviously, the middle of the two-square model can well fit training data (although the loss relative to the third model is larger, but it can well characterize the characteristics of the data, but also good robustness to the data, but the data is affected by a certain factor in which a few of the data deviate from the original data, the model can still be well-fitted).
The phenomena of overfitting and under-fitting are discussed from the regression problem, which is also applicable to the classification problem.
solve the Overfitting method: Remove some features that do not help us predict correctly. Generally when the number of features is more, training samples are relatively young, easy to appear in the fit. So you can manually select some of the relevant features, remove some irrelevant features, and we can also use some algorithms to choose, such as PCA (follow-up explanation). Regularization of the. The method retains all the characteristics, only reducing the size of the parameter. The regularization method has a very good performance when compared with the features, and each feature can have a little impact on the predicted results.

Let's take an example to understand what regularization is:
Or from the above example, we look at the third image, the fitting obvious existence of the phenomenon, its hypothesis is:
hθ (x) =θ0+θ1 (x1) +θ2x22+θ3x33+θ4x44 H_{\theta} (x) =\theta _{0}+\theta _{1} (X_{1}) +\theta _{2}x_{2}^{2}+\theta _{3}x_{3 }^{3}+\theta _{4}x_{4}^{4}

The process of regularization in a cost function:
By comparing these three images, it is not difficult to find that the cause of overfitting is due to the high-level characteristics, so we should reduce the weight of the high-θ3, θ4 \theta _{3}, \theta _{4}, we can start with the cost function, for the parameter θ 3, θ4 \theta _{3} , \theta _{4} added penalty, so that the parameters θ 3, θ4 \theta _{3}, \theta _{4} decreased, the modified cost function can be:

Let us analyze the implementation process, for the above expression, our goal is to minimize the loss function, but for θ 3, θ4 \theta _{3}, \theta _{4}, the front coefficient is very large, in order to minimize the loss function, you must make θ3, Θ4 \theta _{3}, \ Theta _{4} has a small value to reduce θ 3, θ4 \theta _{3}, \theta _{4}.

The above is only from specific examples to achieve regularization process, but in general, the number of features more, we do not know which parameters should be punished, so we will punish all the parameters, and let the cost function to choose the degree of punishment, so there is a regularization of the loss function is as follows:

The previous section represents the smallest loss of data fitting, and the next one represents the bottom of the parameter overlay. Λ\lambda is called a regularization parameter, which controls the balance between the two.
It is worth noting that we generally do not punish θ0 \theta _{0}.

For the above examples, the comparison between regularization and non-regularization is as follows:

Now let's explain the λ.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Stanford University Machine Learning notes-overfitting problems and regularization solutions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Stanford University Machine Learning notes-overfitting problems and regularization solutions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support