Ridge Regression Ridge Regression statistical model

Last Update:2018-08-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Ridge regression is used to deal with the following two types of problems:

1. Number of positions less than the number of variables

2. There is a collinearity between variables

There is a collinearity between the variables, the coefficients of the least squares regression are unstable and the variance is very large, because the matrix of the coefficient matrix X and its transpose matrix cannot be reversed, and ridge regression by introducing the LAMDA parameter, the problem is solved. In the R language, functions in the mass package Lm.ridge () can be easily completed. Its input matrix X is always n*p, regardless of whether it contains a constant term.

When a constant term is included, the function centers the Y and takes the mean value of y as a factor to center and normalized the x, taking the mean and standard deviation of each variable as the factor. So after the x and y processing, the mean value of x and Y is 0, which causes the regression plane to pass through the origin, that is, the constant term is 0. Therefore, although the containing constant term is specified, the coefficients given by the Lmrige Coef do not have a constant entry value. When using the model for forecasting, it is also necessary to center and normalized x and Y, the factor is the use of training time to center and normalized factors, and then multiplied by the coefficients of the predicted results, it should be noted that, if the model is established in the Command Line window directly input Lmridge, There will also be a set of coefficients, the coefficient will contain constant term, this coefficient and the model given by the coefficient Lmridge Coef, because it is not normalized and centralized data, the prediction can be used directly, do not need to be normalized and centralized data.

When the specified model does not contain a constant entry, the model assumes that the mean value of each variable is 0, so that the x and Y are not centered, because you want to emphasize through the origin. But the normalization of X, and the normalized factor is also the standard deviation of the variable with the assumption that the mean value is 0. When making predictions, if the lmridge$coef coefficients are used, the data needs to be normalized. If you use the coefficients directly given by Lmridge, you simply multiply them directly.

Ridge regression LAMDA Choice: You can use Select (Lmridge) for automatic setting, generally use the GCV minimum value, LAMDA range is greater than 0.

The principle of ridge regression

Ridge regression is a kind of biased estimation regression method, which is specially used in collinearity data analysis. In essence, an improved least squares estimation method, by giving up the unbiased of least squares, taking the loss of part information and reducing the precision as the cost, the regression coefficient is more realistic and reliable. The tolerance of morbid data is much stronger than that of least squares.

The principle of ridge regression is more complicated. According to Gausmarkov, multiple correlations do not affect the unbiased and minimum variance of least squares estimator, however, although the least squares estimator is the smallest variance in all linear estimators, this variance is not necessarily small, but it can actually find a biased estimator, which has a smaller bias, But its precision can be much higher than the unbiased estimate. Ridge regression analysis is based on this principle, by introducing the regression estimator in the normal equation with partial Changshu.

Disadvantage: Usually the R-squared value of the ridge regression equation will be slightly lower than the ordinary regression analysis, but the regression coefficient is often significantly higher than the ordinary regression, in the existence of collinearity and pathological data in the study of more practical value.

Gauss-Markov theorem

In statistics, the Gauss-Markov theorem states that:

In the linear regression model with the error 0 mean, same variance, and not correlated, the best unbiased linear estimation of the regression coefficient blue is the minimum variance estimate.

Generally, the blue of any linear combination of regression coefficients is its minimum variance estimate.

In this linear regression model, the error does not need to assume a normal distribution, nor does it need to assume independence (but it needs to be unrelated to this weaker condition), and not to assume the same distribution.

Specifically, suppose

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Ridge Regression Ridge Regression statistical model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Ridge Regression Ridge Regression statistical model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support