The regularization of avoiding over-fitting

Source: Internet
Author: User

The regularization of avoiding over-fitting

"The less assumptions, the better the results"

Business Scenario:

Overfitting is a common problem when we choose a pattern to fit the data. Generalized models tend to avoid overfitting, but in some cases it is necessary to manually reduce the complexity of the model and reduce the model-related properties.

Let's consider such a model. There are 10 students in the classroom. We are trying to predict their future results through their past achievements. A total of 5 boys and 5 girls. The average female score was 60 and the average male score was 80. The average score for all students was 70.

Now there are several predictive methods:

1 using 70 points as a predictor of class performance

2 of boys were predicted to score 80, while girls scored 60 points. This is a simple model, but the predictive effect is better than the first model.

3 We can continue to refine the predictive model, for example, using the results of each person's last exam as a predictor of the next score. This analysis granularity has reached a level that can cause serious errors.

Statistically speaking, the first model is called fit insufficiency, the second model may achieve the best results prediction, and the third model is over-fitted.

Now let's look at a curve fitting diagram

A two-function relationship is present between the Y and the argument x. Using a polynomial function with a higher order to fit the training set can produce very accurate fitting results, but the prediction effect on the test set is rather poor. Next, we will briefly introduce some methods to avoid overfitting, and will mainly introduce regularization methods.

Ways to avoid overfitting

1 cross-validation: cross-validation is the simplest form of verification, and each time we divide the sample into K-parts, leave one as a test sample and use the other as a training sample. The model is obtained by learning the training samples, and the model is used to predict the test samples. Cycle through the above steps so that each sample is used as a test set. In order to maintain a lower method, the cross-validation model with a large K-value is favored.

2 Stop Method: Stop method for beginners to avoid over-fitting provides a guide to the number of cycles

3 Pruning Method: Pruning method is widely used in Gart (decision tree) model. It is used to remove nodes that have less predictive lift effects.

4 regularization: This is the approach we'll cover in detail. This method introduces the concept of loss function to the number of variables in the target function. In other words, the regularization method reduces the dimension of the model by making the coefficients of many variables 0 and reduces the loss.

Regularization Basics

Given some argument x, a simple regression model between the dependent variable y and X is established. The regression equation is similar to:

y = a1x1 + a2x2 + a3x3 + a4x4 ....

In the above equation A1, A2, A3 ... For regression coefficients, while x1, x2, x3. As a self-variable. Given the arguments and dependent variables, the regression coefficients are estimated based on the objective function A1, A2, A3 .... For the linear regression model, the objective function is:

If there is a large number of x1, x2, x3 dependent variables, the problem of overfitting may occur. Therefore, we introduce a new penalty term to form a new objective function to estimate the regression coefficients. Under this modification, the target function becomes:

The new addition in the equation can be the sum of the squares of the regression coefficients multiplied by a parameter λ. If the λ=0 over-fit the upper bound scenario. λ tends to infinity and the regression problem becomes the mean value of Y. optimization λ needs to find a balance between the predictive accuracy of the training sample and the test sample.

Understanding the mathematical basis of regularization

There are various ways to calculate regression coefficients. A very common method is the coordinate descent method. The reduction of coordinates is an iterative method, which constantly looks for the regression coefficient value which makes the minimum convergence of the objective function after the given initial value. So we concentrate on the partial derivative of the regression coefficients. Before giving more derivative information, I gave the final iterative equation directly:

(1)

Here theta is the estimated regression coefficient, α is the learning parameter. Now we are introducing the loss function, which will be transformed into a linear form after the partial derivative of the regression coefficients is obtained. The final iteration equation is as follows:

(2)

If you look closely at the equation, you will find that the starting point of each iteration of the Θ is slightly smaller than the previous iteration result. This is the only difference between (1) and (2) two iterative equations. The iterative equation (2) attempts to find the Θ value of the minimum convergence of absolute values.

Conclusion

In this article, we briefly introduce the idea of regularization. Of course, the concept of correlation is far more profound than our introduction. In some of the following articles we will continue to introduce some of the concepts of regularization.

Original Tavish Srivastava

Translation: F.xy

Original link: http://www.analyticsvidhya.com/blog/2015/02/avoid-over-fitting-regularization/

The regularization of avoiding over-fitting

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.