R in Action reading notes (9)-eighth chapter: Regression-regression diagnosis

Source: Internet
Author: User

8.3 Regression Diagnosis

> Fit<-lm (weight~height,data=women)

> par (mfrow=c (2,2))

> Plot (FIT)

To understand these graphs, let's review the statistical assumptions of OLS regression.

Mouth normality when the Predictor value is fixed, the dependent variable is normally distributed, and the residual value should also be a normal distribution with a mean of 0. A normal q-q graph (normal q-q, upper right) is a probability map of the normalized residuals under the corresponding value of the normal distribution. If the normal hypothesis is satisfied, then the point on the graph should fall on a straight line at a 45-degree angle, and if not, it violates the hypothesis of normality.

Port independence You cannot tell from these graphs whether the dependent variable values are independent from each other and can only be verified from the collected data. In the above example, there is no priori reason to believe that a woman's weight can affect another woman's weight. If you find that data is sampled from a family, you may have to adjust the hypothesis of model independence.

If the linear dependent variable is linearly correlated with the independent variable, then the residual value and the predicted (fitted) value have no system association. In other words, in addition to self-noise, the model should contain all the system variances in the data. A curve relationship can be clearly seen in the residual plot and fit graph (residuals vs fitted, upper left), suggesting that you may need to add a two-time term to the regression model.

If the variance of the port satisfies the invariant variance hypothesis, then the points around the horizontal line should be randomly distributed in the position scale graph (scale-location graph, bottom left). The diagram seems to satisfy this assumption. The last "residuals and leverage graph" (residuals vs Leverage, lower right) provides information about individual observations that you might be interested in. Outliers, high leverage points and strong impact points can be identified from the graph.

8.3.2 Method of improvement

Qqplot () The comparison chart of the number of bits

Durbinwatsontest () Durbin-watson test of error autocorrelation

Crplots () composition and residual plot

Ncvtest () A score test for the non-constant error variance

Spreadlevelplot () Dispersion level test

Outliertest () Bonferroni outlier test

Avplots () added variable graphic

Inluenceplot () regression effect diagram

Scatterplot () Enhanced scatter plot

Scatterplotmatrix () Enhanced scatter graph matrix

Vif () Variance expansion factor

R in Action reading notes (9)-eighth chapter: Regression-regression diagnosis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.