Linear regression, ridge regression, and lasso regression

Last Update:2015-11-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although some of the content is still not understood, first intercepted excerpts.

1. Variable selection problem: from normal linear regression to lasso

Normal linear regression using least squares fitting is the basic method of data modeling. The key point of the modeling is that the error term generally requires an independent distribution (often assumed to be normal) 0 mean value. The T-test is used to test the significance of the fitted model coefficients, and the F-test is used to test the significance of the model (variance analysis). If normality is not true, T-Test and F-test are meaningless.

When modeling more complex data, such as text categorization, image denoising, or genomic research, there are some problems with normal linear regression:
(1) The problem of prediction accuracy if there is a significant linear relationship between the response variable and the Predictor variable, the least squares regression will have a very small bias, especially if the observed quantity n is much greater than the Predictor p, the least squares regression will also have a smaller variance. But if N and P are closer, they are prone to overfitting, and if n<p, the least squares regression does not yield meaningful results.
(2) The problem of model interpretation capability includes many variables in a multivariate linear regression model which may be independent of the response variable, and may produce multiple collinearity phenomena: that is, there is a significant correlation between multiple predictor variables. These conditions increase the complexity of the model and weaken the interpretation of the model. Variable selection (feature selection) is required.

For the OLS problem, there are three ways to extend the variable selection:

(1) Subset selection This is a traditional method, including stepwise regression and optimal subset method, to fit a linear model to a subset of possible subsets, using discriminant criteria (such as AIC,BIC,CP, adjusting R2, etc.) to determine the optimal model.

(2) The method of contraction (shrinkage method) is also called regularization (regularization). mainly ridge regression (ridge regression) and lasso regression. By adding penalty constraint to least squares estimation, some coefficients are estimated to be 0.

(3) Dimensionality reduction of principal component regression (PCR) and partial least squares regression (PLS) methods. The P-Predictor variables are projected into m-dimensional space (m<p), and a linear model is established using the unrelated combinations of projections.

2. Regularization: Ridge regression, Lasso regression

(1) Ridge regression
The least squares estimate is the minimization of the residuals squared sum (RSS):

Ridge regression adds a narrowing penalty (regularization L2 norm) to the calculation of minimizing RSS

In this penalty, Lambda is greater than or equal to 0, which is an adjustment parameter. The smaller the coefficient, the smaller the penalty, so the addition of the penalty will help to reduce the estimated parameters to close to 0. The focus is on the determination of lambda, which can be used with cross-validation or CP guidelines.

The reason that ridge regression is better than least squares regression is variance-bias selection. As the lambda increases, the model variance decreases and the bias (slight) increases.

one drawback of ridge regression: When modeling, the P-predictor variables are introduced at the same time, and penalty constraints can shrink these predictors to an estimated coefficient of nearly 0, but not exactly 0 (unless Lambda is infinity). This shortcoming has little effect on model precision, but it is difficult to explain the model. This shortcoming can be overcome by lasso. (so ridge regression, although reducing the complexity of the model, does not really solve the problem of variable selection)

(2) Lasso

Lasso is a relatively new method, referring to [1],[2]. About the development of lasso and some ideas can refer to the online very famous article "Statistical Study of those things" http://cos.name/2011/12/stories-about-statistical-learning/.

Lasso is a penalty constraint by adding a L1 norm to the RSS minimization calculation:

The advantage of the L1 norm is that when the lambda is sufficiently large, some of the estimated coefficients can be precisely shrunk to 0.

With regard to ridge regression and Lasso, in [3] there is a graph that can be visually compared ([3] The third chapter is a very good reference on the subject of this article):

About Ridge Regression and lasso of course, they can also be seen as an optimization problem with RSS as the objective function and penalty as the constraint.

Original reference: http://site.douban.com/182577/widget/notes/10567212/note/288551448/

Norm rule in machine learning (i) L0, L1 and L2 norm: http://blog.csdn.net/zouxy09/article/details/24971995

About lasso:http://cos.name/2011/12/stories-about-statistical-learning/

About Ridge Regression: http://www.cnblogs.com/zhangchaoyang/articles/2802806.html

Why is L1 easy to obtain sparse solutions compared to L2? : http://www.zhihu.com/question/37096933/answer/70494622

Linear regression, ridge regression, and lasso regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linear regression, ridge regression, and lasso regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linear regression, ridge regression, and lasso regression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support