Over-fitting and regularization in machine learning

Source: Internet
Author: User

This article shares with you the main is Machine Learning in the cross-fitting and regularization of related content, come together to see it, I hope to be helpful to everyone.

To fit a curve with linear regression, or to use logistic regression to determine the classification boundary, there are a number of selected curves, as follows:

The different curves, the expression ability of the sample, vary.

curve 1, using the first order curve, that is, the straight line model, too simple, there are a lot of error classification, at this time the error is large, the model is not fit.

curve 2, using the high-order curve, is almost perfect to complete the fitting task, but such a rigorous model, when the new sample and training samples slightly different, it is very likely to have a miscarriage of error, when the model is over-fitted.

curve 3, a relatively smooth curve, the basic can complete the fitting task, but also for the individual noise is not so sensitive. is a more ideal model.

How to get curve 3 ?

from the shape of curve 2 , it is obvious that the influence of higher order items is too great.

assume that the equation for curve 2 is:

$

H_\\theta (x) = \\theta^{t}x = \\theta_0 + \\theta_1x + \\theta_2x^2 + \\theta_3x^3 +...+\\theta_nx^n

$

if you want to weaken the effect of high-order item Xnxn , you can do so by reducing the value of Θnθn.

that is , when the θθ matrix is obtained, it is necessary to make the element values within the matrix as small as possible.

The θθ is calculated by minimizing the error function, and therefore, the J function is reformed - regularization.

regularization error function

On the basis of the original error function, add a regular term, as follows:

$

j = j + \\frac{\\lambda}{2m}\\sum_{j=1}^{n}\\theta_j^2

$

the regular item, which is the sum of the squares of all θθ parameters. λλ is a regularization parameter that can be used to train the model with different θθ values and to compare the final error to determine the value.

With the regular term, when the minimum value of the error function is obtained, the optimal solution should be gotten, not only the error of the sample should be minimized, but also the θθ value should be minimized.

This will meet the requirements of the previous section.

and the corresponding gradient, by derivation can be obtained:

$

Grad_0 = Grad_0, (j = 0)

$

$

Grad_j = Grad_j+\\frac{\\lambda}{m}\\theta_j, (J > 0)

$

Note:

one thing to note about regular items is that θ0θ0 is not counted in the item .

because the θ0θ0 is characterized by a constant of 1 (x0=1x0=1), which is 0- time Square. It only affects the position and height of the curve, and has no effect on the zigzag degree of the model, so it does not need to do regularization treatment.

is it possible to only pass through regularization to solve the fit phenomenon ?

Answer: not also.

First of all, it is necessary to explain that the root cause of the occurrence of overfitting is that there are more variables in the model, and it is necessary to have enough training samples to constrain so many variables.

that is, when the training sample gradually increased, then the Curve 2 will slowly change to curve 3 , but to fit close to the curve of the state of 2, the required sample size will be very large, and the final training will be a large amount of computation, Not very necessary.

regularization error function and its partial derivative realization

Column-only key section code

1 linear regression

h = X*theta;

theta_tmp = Theta (2:length (theta), 1);

< Span style= "FONT-FAMILY:CALIBRI;FONT-SIZE:16PX;" > J = 1/(2*m) * ( h-y ) ' * ( h-y ) + lambda/(2*m) * SUM ( theta_tmp .^2);

Grad = 1/m * x ' * (h y) + (lambda/m) *[0;theta_tmp];

2 logistic regression

h = sigmoid (x*theta);

theta_tmp = Theta (2:length (theta), 1);

J = 1/m * SUM (-y*log (h)-(1-y). *log (1-h)) + lambda/(2*m) * SUM (theta_tmp. ^2); /c6>

grad = 1/m. * X ' * (h-y) + (lambda/m) *[0;theta_tmp];

Source: Network

Over-fitting and regularization in machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.