Linear regression, ridge regression, and lasso regression

Source: Internet
Author: User

Although some of the content is still not understood, first intercepted excerpts.

1. Variable selection problem: from normal linear regression to lasso

Normal linear regression using least squares fitting is the basic method of data modeling. The key point of the modeling is that the error term generally requires an independent distribution (often assumed to be normal) 0 mean value. The T-test is used to test the significance of the fitted model coefficients, and the F-test is used to test the significance of the model (variance analysis). If normality is not true, T-Test and F-test are meaningless.

When modeling more complex data, such as text categorization, image denoising, or genomic research, there are some problems with normal linear regression:
(1) The problem of prediction accuracy if there is a significant linear relationship between the response variable and the Predictor variable, the least squares regression will have a very small bias, especially if the observed quantity n is much greater than the Predictor p, the least squares regression will also have a smaller variance. But if N and P are closer, they are prone to overfitting, and if n<p, the least squares regression does not yield meaningful results.
(2) The problem of model interpretation capability includes many variables in a multivariate linear regression model which may be independent of the response variable, and may produce multiple collinearity phenomena: that is, there is a significant correlation between multiple predictor variables. These conditions increase the complexity of the model and weaken the interpretation of the model. Variable selection (feature selection) is required.

For the OLS problem, there are three ways to extend the variable selection:

(1) Subset selection This is a traditional method, including stepwise regression and optimal subset method, to fit a linear model to a subset of possible subsets, using discriminant criteria (such as AIC,BIC,CP, adjusting R2, etc.) to determine the optimal model.

(2) The method of contraction (shrinkage method) is also called regularization (regularization). mainly ridge regression (ridge regression) and lasso regression. By adding penalty constraint to least squares estimation, some coefficients are estimated to be 0.

(3) Dimensionality reduction of principal component regression (PCR) and partial least squares regression (PLS) methods. The P-Predictor variables are projected into m-dimensional space (m<p), and a linear model is established using the unrelated combinations of projections.

2. Regularization: Ridge regression, Lasso regression

(1) Ridge regression
The least squares estimate is the minimization of the residuals squared sum (RSS):



Ridge regression adds a narrowing penalty (regularization L2 norm) to the calculation of minimizing RSS


In this penalty, Lambda is greater than or equal to 0, which is an adjustment parameter. The smaller the coefficient, the smaller the penalty, so the addition of the penalty will help to reduce the estimated parameters to close to 0. The focus is on the determination of lambda, which can be used with cross-validation or CP guidelines.

The reason that ridge regression is better than least squares regression is variance-bias selection. As the lambda increases, the model variance decreases and the bias (slight) increases.

one drawback of ridge regression: When modeling, the P-predictor variables are introduced at the same time, and penalty constraints can shrink these predictors to an estimated coefficient of nearly 0, but not exactly 0 (unless Lambda is infinity). This shortcoming has little effect on model precision, but it is difficult to explain the model. This shortcoming can be overcome by lasso. (so ridge regression, although reducing the complexity of the model, does not really solve the problem of variable selection)

(2) Lasso

Lasso is a relatively new method, referring to [1],[2]. About the development of lasso and some ideas can refer to the online very famous article "Statistical Study of those things" http://cos.name/2011/12/stories-about-statistical-learning/.

Lasso is a penalty constraint by adding a L1 norm to the RSS minimization calculation:



The advantage of the L1 norm is that when the lambda is sufficiently large, some of the estimated coefficients can be precisely shrunk to 0.

With regard to ridge regression and Lasso, in [3] there is a graph that can be visually compared ([3] The third chapter is a very good reference on the subject of this article):



About Ridge Regression and lasso of course, they can also be seen as an optimization problem with RSS as the objective function and penalty as the constraint.

Original reference: http://site.douban.com/182577/widget/notes/10567212/note/288551448/

Norm rule in machine learning (i) L0, L1 and L2 norm: http://blog.csdn.net/zouxy09/article/details/24971995

About lasso:http://cos.name/2011/12/stories-about-statistical-learning/

About Ridge Regression: http://www.cnblogs.com/zhangchaoyang/articles/2802806.html

Why is L1 easy to obtain sparse solutions compared to L2? : http://www.zhihu.com/question/37096933/answer/70494622

Linear regression, ridge regression, and lasso regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.