Over-fitting and regularization in machine learning

Last Update:2018-04-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article shares with you the main is Machine Learning in the cross-fitting and regularization of related content, come together to see it, I hope to be helpful to everyone.

To fit a curve with linear regression, or to use logistic regression to determine the classification boundary, there are a number of selected curves, as follows:

The different curves, the expression ability of the sample, vary.

curve 1, using the first order curve, that is, the straight line model, too simple, there are a lot of error classification, at this time the error is large, the model is not fit.

curve 2, using the high-order curve, is almost perfect to complete the fitting task, but such a rigorous model, when the new sample and training samples slightly different, it is very likely to have a miscarriage of error, when the model is over-fitted.

curve 3, a relatively smooth curve, the basic can complete the fitting task, but also for the individual noise is not so sensitive. is a more ideal model.

How to get curve 3 ?

from the shape of curve 2 , it is obvious that the influence of higher order items is too great.

assume that the equation for curve 2 is:

H_\\theta (x) = \\theta^{t}x = \\theta_0 + \\theta_1x + \\theta_2x^2 + \\theta_3x^3 +...+\\theta_nx^n

if you want to weaken the effect of high-order item Xnxn , you can do so by reducing the value of Θnθn.

that is , when the θθ matrix is obtained, it is necessary to make the element values within the matrix as small as possible.

The θθ is calculated by minimizing the error function, and therefore, the J function is reformed - regularization.

regularization error function

On the basis of the original error function, add a regular term, as follows:

j = j + \\frac{\\lambda}{2m}\\sum_{j=1}^{n}\\theta_j^2

the regular item, which is the sum of the squares of all θθ parameters. λλ is a regularization parameter that can be used to train the model with different θθ values and to compare the final error to determine the value.

With the regular term, when the minimum value of the error function is obtained, the optimal solution should be gotten, not only the error of the sample should be minimized, but also the θθ value should be minimized.

This will meet the requirements of the previous section.

and the corresponding gradient, by derivation can be obtained:

Grad_0 = Grad_0, (j = 0)

Grad_j = Grad_j+\\frac{\\lambda}{m}\\theta_j, (J > 0)

Note:

one thing to note about regular items is that θ0θ0 is not counted in the item .

because the θ0θ0 is characterized by a constant of 1 (x0=1x0=1), which is 0- time Square. It only affects the position and height of the curve, and has no effect on the zigzag degree of the model, so it does not need to do regularization treatment.

is it possible to only pass through regularization to solve the fit phenomenon ?

Answer: not also.

First of all, it is necessary to explain that the root cause of the occurrence of overfitting is that there are more variables in the model, and it is necessary to have enough training samples to constrain so many variables.

that is, when the training sample gradually increased, then the Curve 2 will slowly change to curve 3 , but to fit close to the curve of the state of 2, the required sample size will be very large, and the final training will be a large amount of computation, Not very necessary.

regularization error function and its partial derivative realization

Column-only key section code

1 linear regression

h = X*theta;

theta_tmp = Theta (2:length (theta), 1);

< Span style= "FONT-FAMILY:CALIBRI;FONT-SIZE:16PX;" > J = 1/(2*m) * ( h-y ) ' * ( h-y ) + lambda/(2*m) * SUM ( theta_tmp .^2);

Grad = 1/m * x ' * (h y) + (lambda/m) *[0;theta_tmp];

2 logistic regression

h = sigmoid (x*theta);

theta_tmp = Theta (2:length (theta), 1);

J = 1/m * SUM (-y*log (h)-(1-y). *log (1-h)) + lambda/(2*m) * SUM (theta_tmp. ^2); /c6>

grad = 1/m. * X ' * (h-y) + (lambda/m) *[0;theta_tmp];

Source: Network

Over-fitting and regularization in machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Over-fitting and regularization in machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Over-fitting and regularization in machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support