Multivariate linear regression model uses:
1, regression used to fit, explain the phenomenon;
2, used to construct a predictive model between the observation data set and the independent variable;
3. Used to quantify Y and correlation strength
Assume:
1. Observation data are independent of each other
2. Random error obeys normal distribution with the same variance
Principle:
# # #R language ####################
1, ######################## #查看数据相关性 ######################################
Data=iris
Round (Cor (DATA[,1:4)), 3)
Plot (DATA$X1,DATA$X2)
2, ######################## #初步建模 ##########################################
LM1=LM (Y~X1+X2+X3,DATA=A1)
Summary (LM1)
Through the F-test, it is found that the model is linearly related to the initial modeling. From the T test, it is found that the linear relationship between the independent variables and the dependent variables is significant (note that the remaining fields can be excluded).
3, ###################### #模型诊断, that is, the significance of the test ##############################
Par (Mfrow=c (2,2)) # # # #设置画图为2 format
Plot (Lm1,which=c (1;4)) ####### #画出lm1中对应于模型检验的4张图, including residual plots (whether random error items are the same variance), QQ plots (check if random error items are normally distributed) and cook distance graphs (test outliers)
A1=A1 ( -47) ######## #如有异常值, should be removed
############################### #剔除后看效果 ############################
LM2=LM (Y~X1+X2+X3,DATA=A1)
Summary (LM2)
4, ################### #检验自变量多重共线性 ##################################
#################### #方差膨胀因子检验: vif###########################
Library (CAR)
Round (Vif (LM2), 2)
################### #AIC和BIC选择 #########################
Lm.aic=step (LM2,TRACE=F)
Summary (Lm.aic)
Lm.bic=step (Lm2,k=log (Length (a1[,1)), trace=f)
Summary (Lm.bic)
5, ############################# #模型效果 #######################################
Y1=predict (LM2,A2)
Y2= Predict (LM.AIC,A2)
Y2= Predict (LM.BIC,A2)
Y0=A2[,10]
R0=y0-a2$roet
R1=y0-y1
R2=y0-y2
R3=y0-y3
Resid=abs (As.data.frame (Cbind (R0,R1,R2,R3)))
Sampply (Resid,mean)
#########matlab language ########################
1, B=regress (y,x), B is the regression coefficient estimate
2, [B,bint,r,rint,stats]=regress (Y,x,alpha)
Alpha is a significant level (default is set to 0.05), B,bint is the regression coefficient estimates and their confidence interval, r,rint is the residual (vector) and its confidence interval, stats is used to test the regression model of the statistics, there are four values, the first is, the second is F, the third is with F The corresponding probability p, reject, regression model is established, fourth is the variance of the residuals
3. Residual error and its confidence interval can be drawn with Rcoplot (R,rint)
###################### #查看除第? The confidence interval for the remainder of the data is 0 points, which is considered an anomaly, and then recalculated after it is removed ###########################
4, ####################### #实现变量选择 ##############################
Stepwise (X,y,inmodel,alpha) where x is the argument data, Y is the dependent variable data, respectively m NX and 1xn matrix, Inmodel is the index of the number of columns of Matrix X, gives the subset included in the initial model (default is set to NULL), Alpha is the significant level.
Theory and practice of multivariate linear regression