?
?
?
Multivariate linear regression model
The result of the least squares estimation is
If there is a strong collinearity, that is, there is a strong correlation between the column vectors, which causes the value on the diagonal to be large
and a different sample can also cause parameter estimates to vary greatly. That is, the variance of parameter estimators also increases, and the estimation of parameters is inaccurate.
So, is it possible to delete some variables with strong correlations? If there is a strong correlation between p variables, then which of them should be removed is better?
In this paper, two methods can be used to determine how to reject a model with multiple collinearity. namely Ridge return and lasso (note: Lasso is developed on the basis of the ridge return)
?
?
Thought:
Since collinearity causes the parameter estimates to become very large, the objective function of least squares is added with a pair of penalty functions
When you minimize a new objective function, you also need to consider the size of the value, not too large .
Add the coefficient k to the penalty function
As the k increases, the effects of collinearity will become smaller. In the process of increasing the coefficient of penalty function, the variation of the estimated parameter (k) is drawn, which is the ridge trace.
Determine whether we want to remove this parameter by the shape of the ridge trace (for example: the ridge trace fluctuates very much, indicating that the variable parameter has collinearity).
Steps:
- To standardize the data, so as to facilitate the subsequent comparison of the (k) Ridge, otherwise the parameter size of different variables is not comparable.
- Construct the penalty function, and draw a ridge map for different K.
- Select which variables to remove from the ridge map.
Objective function of Ridge regression
In the formula, T is the function. The bigger, the smaller the T (here is k)
As in, the tangent point is the solution that the ridge returns. is the geometrical meaning of the ridge regression.
It can be seen that ridge regression is to control the range of changes, weakening the effect of collinearity on size.
The estimated results of the ridge regression are:
Nature of the ridge regression
???? The objective function of the ridge regression shows that the greater the coefficient (or k) of the penalty function, the greater the importance of the penalty function in the objective function.
the smaller the parameter is estimated . We call the coefficient (or k) the ridge parameter. Because the ridge parameter is not unique, the ridge regression estimate we get is actually an estimated family of regression parameters. For example, in the following table:
Ridge Track Map
???? The relationship between the regression estimation parameter and the ridge regression parameter K in the above table is represented by a graph, that is, the ridge trace map
????
???? When no singularity is present, the ridge trace should be steadily gradually tending to 0
When there is singularity, from the ridge regression parameter estimation results can be seen, just at the beginning K is not large enough, the singularity did not get too much change, so with the change of K, the estimated parameters of the regression vibration is very large, when K is large enough, the effect of singularity gradually reduced, thus estimating the value of the parameter gradually stabilized.
General principles of Ridge parameter Selection
- The ridge estimation of each regression coefficient is basically stable
- There is no obvious anomaly in the regression parameters, its ridge estimation of the symbol should be made reasonable
- Regression coefficients do not have an absolute value of the actual meaning
- The sum of squares of residuals is not much increased
Selecting variables with ridge regression
- Since ridge regression is a regression after normalized variables, the magnitude of the ridge regression coefficients can be compared to one another, eliminating the standardization
- With the increase of k, the regression coefficient is unstable, and the variable with zero vibration can be eliminated.
So, the question is, how does it tend to be 0 to see it? Can the program automatically judge it? If there are several unstable regression coefficients, which should be removed? This needs to be determined by the regression effect after removing a variable. This involves the extension of the ridge regression method Lasso.
Before that, run a ridge regression example with the R language first
The package for the R language Ridge regression is mass, and the function that runs the ridge regression is Lm.ridge
1. Load the mass package and use R's built-in Longley data set (macroeconomic data) as an example. (Note: Macroeconomic data generally has a more serious collinearity problem (
- Results of regression using the traditional OLS method
If you find that there are a few variables that have no significant results, do you want to remove the variables? We use the ridge regression to remove the variable.
- Selected ridge regression parameters, the results given
- Observe the ridge map and perform variable culling
?
Question to be solved: in the end which color represents which variable ah ... Nima
???? You can use the naked eye to select the K value and then put it into a lambda (the Lm.ridge function has a lambda default of 0)
- According to different methods to choose K, we can find that the choice of ridge regression parameter has very large uncertainty
?
?
- Tibshirani (1996) Proposed Lasso (the Least Absolute Shrinkage and
Selectionator operator) algorithm
- Unlike ridge regression, Lasso constructs a first-order penalty function, which makes the coefficients of some variables of the model 0 (the likelihood of a ridge regression coefficient of 0 is very great)
- As with the ridge regression, Lasso is also biased.
Model form Comparison
Ridge return:
LASSO
As can be seen, the penalty function of lasso is absolute form, its function form is more compressed, it is more intuitionistic to use the geometrical meaning to explain:
is the geometrical representation of the lasso model.
is the geometrical representation of the ridge regression.
?
Red is the minimum area, and blue is the constrained area.
It can be found that the lasso is more "sharp" because it is the absolute value form. The estimated parameters for regression are more easily 0.
On the left is Ridge return, the right is lasso. In each diagram from right to left, K values gradually increase, you can see that lasso in the process of increasing the value of K,
Regression parameters are often estimated to have a condition of 0, and for this parameter, we can choose to reject them. We do not have to manually select the culling variable, but can let the program automatically according to whether or not to remove the variable.
The problem now is that the lasso regression is difficult to get an expression of the estimated parameters because its penalty function is absolute, how to solve it?
Statisticians have found that the results of the lasso regression and the minimum angular regression are highly similar, so the lasso can be estimated with the results of the minimum angular regression lar. (The specific ideas and proofs are very complex, and will later continue to write a blog post to elaborate).
对LAR模型进行一定的修正后,便能够让LAR的结果与LASSO基本上一致了。因此,我们用LAR的算法来对LASSO进行计算。
?
?
Package: Lars
Library (Lars)
Working with Longley datasets
The result is a least-squares regression using the LAR algorithm for linear regression.
You can see that both the year and the employed are deleted and used repeatedly, and these two should be erased.
In the results of summary, the CP represents the judgment of collinearity, which can be seen in the 8th step of the model.
Collinearity is the smallest, combined with the condition of the 8th step in Laa, so removing the two variables of year and employed is
More appropriate.
?
?
The solution of multiple collinearity--Ridge regression and Lasso