One of the assumptions of the standard linear regression model is the variance homogeneity of the variables, that is, the variance of the dependent variable or residuals does not change with the value of its own prediction or other independent variables. But sometimes, the situation is violated, called the variance, such as the dependent variable is the amount of savings, the self-variable is the family income, it is obvious that high-income households because of more disposable income, so the savings amount of large differences, and low-income families because there is no excessive choice, so the savings will be more planning and regularity.
Variance If you still use the ordinary least squares to estimate, then the following problems will be caused
1. Estimates are still unbiased, but not effective
2. The significance of the test of the variable loses its meaning
3. Due to the increase of the variance of the estimator, the model prediction error increases and the accuracy decreases.
How can you tell if there is a difference in variance?
1. Judging by professional experience, such as savings and household income in the above example
2. Make a scatter plot of the arguments and residuals to see if there is a trend
3. Use hypothesis testing, such as park-gleiser inspection, Goldfeld-quandt inspection, white test, etc.
The correction of variance can use the weighted least squares method, the basic idea is to assign different weights to the corresponding data according to the variation size, to give the larger weight to the smaller variation, and to give the smaller weight to the larger variation, so that the model tends to be balanced.
In SPSS, the weighted least squares method has two processes to operate, one is to add WLS weights directly to the linear regression, which is mainly for the case of known weights, and if the weights are unknown, they need to be manipulated in the special "weight estimation" process. Let's look at these two processes separately.
1. Analysis-Regression-linear
This data is a regression of X-to-y, if there are only two variables, you can directly use simple linear regression, but there is a sample number n in the data, if you use simple linear regression, the default least squares estimation rule that the number of samples does not affect the results, which is obviously not reasonable, The variation of the sample size is different from that of small sample size, so we need to use the weighted least squares method to take the number of samples as weights, and in order to compare the results we use two methods to fit
2. Analysis-regression-weight estimation
In the above example, we already know that the sample size represents the weight size, indicating that the weight is already known, but sometimes the weight size is not very clear, need to be gradually determined in the fitting, so we adopt the WLS method of another process, the process first to determine the weight variable, the weight variable is one of the variables to be analyzed, It needs to be judged professionally, and in this case we still use N as the weight variable.
SPSS data analysis-weighted least squares