Principle and application of Ridge regression technology
author Ma Wenmin
Ridge regression analysis is a biased estimation regression method dedicated to collinearity analysis, which is essentially an improved least squares estimation method, which is more consistent with the actual and more reliable regression method by discarding the unbiased nature of least squares, obtaining the regression coefficients at the cost of loss of partial information and reducing the accuracy. The tolerance of pathological data is much stronger than the least square method.
Regression analysis: He is a statistical analysis method to establish the quantitative relationship between two or more variables. Using a very wide range of regression analysis according to the amount of design, divided into one-yuan regression and multivariate regression analysis, according to the number of dependent variables, can be divided into simple regression analysis and multiple regression analysis, according to the number of independent variables and dependent variable types can be divided into linear regression analysis and nonlinear regression analysis. If in regression analysis, only one argument and dependent variable are included, and the relationship can be approximated by a straight line, this regression analysis is called unary linear regression analysis. If the regression analysis includes two or more independent variables, and there is a linear correlation between the arguments, it is called multiple regression analysis
The principle of Ridge regression: The principle of ridge regression is more complicated. According to the Gausmarkov theorem, multiple correlations do not affect the unbiased and least variance of least squares estimators, but although the least squares estimator is the least variance in all linear unbiased estimators, the variance is not necessarily minimal. In fact, a biased estimate can be found, although the estimate has a smile bias, but his accuracy can be much higher than unbiased estimates. Ridge regression analysis is based on this principle, by introducing a partial constant in the normal equation to obtain the regression estimates, the specific situation can be consulted data.
For some matrices, a small change of an element in the matrix can cause a large error in the final calculation, which is called a pathological matrix. Sometimes incorrect calculation methods can also cause a normal matrix to appear morbid in the operation. In the case of Gaussian elimination, if the elements on the main element are small, they will show morbid characteristics in the calculation.
The square value of the ridge regression equation is slightly lower than that of common regression analysis, but the significance of regression technique is often higher than that of common regression, which has great use value in the research of collinearity problem and pathological data bias.
Application of Ridge Regression: application in Poultry breeding: This paper discusses the method of estimating poultry fertility in mixed linear mode equations by Ridge regression method, in essence, the traditional mixed linear model equations are understood as a generalized ridge regression estimation, which provides a way to determine the estimation of genetic parameters, meanwhile, taking Muscovy duck as an example, Considering a trait and two fixed effects, the generalized ridge regression was used to estimate the breeding of the male Muscovy ducks, and compared with the best linear unbiased prediction method, the results showed that the generalized ridge regression method and the Blup method were very similar to the cultivated planting and its sequencing, and the correlation coefficient and rank correlation coefficient reached 0.998 and 0.986, and the error rate predicted by generalized Ridge regression method is very low, which indicates that it is feasible to use generalized Ridge regression to estimate animal fertility method in mixed linear model equations, and the process of estimating genetic parameters can be omitted, so the application of Blup method in animal breeding is more practical.
The simulation of satellite photographic data combining forward and reverse: the simulation of satellite photographic data is usually made by forward simulation and reverse simulation method. The forward simulation method is simple and easy, no substitution calculation is needed, but the ground point coordinate has a large difference in y direction, and the inverse simulation can avoid the difference in y direction, but it must be based on the existing DEM data, and the range of DEM data is basically consistent with the scope of the foreign element. Simulation data is subject to data source conditions.
Reference files
Baidu -----NPC Economic Forum
Baidu ------ Baba
Firefox browser
Research on statistical analysis technology of R language--principle and application of Ridge regression technology