Reprinted from: http://www.pinzhi.org/thread-7762-1-1.html
Correlation coefficients r-sq and modified correlation coefficients r-sq (adj) in minitab mean, calculation formula and difference
In Minitab to do regression equations , or similar operations, often encounter multivariate correlation coefficient r-sq and modified multivariate correlation coefficient r-sq (adj), then, what do these 2 mean? What are the specific calculation formulas and differences?
The total effect of the fitting is a multivariate full correlation coefficient (multiple correlation coefficient) R² (i.e. r=sq) and a modified multivariate correlation coefficient (adjusted multiple correlation coefficient) R²adj ( i.e. r-sq (adj))
The square and decomposition formulas in the regression equation indicate that:
Sstotal = Ssmodel + sserror
Considering the ratio of Ssmodel in sstotal, define R-squared (r-square, précis-writers R-SQ):
r²= Ssmodel/sstotal
Obviously, the closer the number is to 1, the better, the smaller the sserror, and the formula above can be written
R²= 1-(sserror/sstotal)
If this controllable normal variable data of an independent variable is also called a random variable, the correlation coefficient between the two can be calculated (Correlation coefficient). And r-sq happens to be the square of the correlation coefficient. Therefore, the meaning of it is very well understood. For the case of multiple independent variables, the definition is invariant, it is generalized as a "multivariate determinant factor" and still represents the proportion of Ssmodel in Sstotal. But he also has a drawback: when the number of independent variables increases, such as only adding a new argument, regardless of whether the increase of the independent variable is significant, R² (R-SQ) will increase some, so in evaluating whether to increase this variable into the regression equation, the use of R² is worthless . To do this, we introduce the modified r², the R²adj, which is defined as:
Above, n is the total number of observations, and P is the total number of items in the regression equation (including constant entries). That is, R²adj (i.e., r-sq (adj)) is deducted from the regression equation of the influence of the number of inclusion of the correlation coefficient, so that the model can more accurately reflect the good or bad, the same, it is also closer to 1 is better, and in practice, because the regression equation contains the number of items P is always greater than or equal to 1, So it is easy to see that R²adj is always smaller than r².
Therefore, to determine the pros and cons of the two models can be judged from r-sq (adj) and r-sq proximity: the smaller the difference, the better the model, we often compare the "whole model" with all the independent variables and delete all the non-significant items of the "reduction model" to see which one is better, If the impact is not significantly deleted, the two are closer, indicating that the deletion of these items does make the model better.
The meaning of correlation coefficients r-sq and correction r-sq (adj) in Minitab, calculation formula and difference [reprint]