Regression analysis is the establishment of a function to predict the dependent variable (also known as the value of the response variable) for multiple independent variables (also known as predictor variables).
For example, the bank assesses the mortgage risk of the applicant based on factors such as age, income, expenditure, occupation, burden on the population, and overall credit limit.
Linear regression
Linear regression is a linear combination function of predictor variables, which is used to predict the statistical analysis method of response variables, the linear regression model has the following form:
y = C0 + c1x1 + c2x2 + ... + ckxk;
x1, x2,... xk as the Predictor variable, and y is the response variable for the prediction.
The following will use the function LM for linear regression analysis on the Australian Consumer Price Index (CPI) data
The data is the quarterly consumer Price index for Australia from 2008 to 2010.
1. You need to create datasets and plot scatter plots. In the following code, using the function axis to manually add a horizontal axis, the parameter las=3 sets the text to the vertical direction.
Year <-Rep (2008:2010, each=4)
Quarter <-Rep (1:4, 3)
CPI <-C (162.2, 164.6, 166.5, 166.0,
166.2, 167.0, 168.6, 169.5,
171.0, 172.1, 173.3, 174.0)
Plot (CPI, Xaxt= "n", ylab= "CPI", xlab= "")
# Draw X-axis
Axis (1, Labels=paste (year,quarter,sep= "Q"), At=1:12, las=3)
Australia 2008-2010 Quarterly Consumer price index
650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/9D/9E/wKioL1mCzYjBQHvwAACHL5LaJ8k488.jpg-wh_500x0-wm_ 3-wmp_4-s_2957187919.jpg "title=" Untitled 1.jpg "alt=" Wkiol1mczyjbqhvwaachl5laj8k488.jpg-wh_50 "/>
2. View the correlation coefficients between CPI and other variables, including the year and quarter (quarter) variables
Cor (YEAR,CPI)
Cor (QUARTER,CPI)
3, using the function LM on the previous data to establish a linear regression model, where year and quarter are predictor variables, CPI is the response variable.
Fit <-LM (CPI ~ Year + quarter)
Fit
Based on the linear model established above, the CPI is calculated as:
CPI = C0 + C1 * year + C2 * Quarter
Among them, C0, C1, C2 are the coefficients of fit model fitting. Therefore, the CPI value for 2011 can be calculated as follows.
(cpi2011 <-fit$coefficients[[1]] + fit$coefficients[[2]]*2011 + fit$coefficients[[3]]* (1:4))
More details of the model can be obtained through the following code.
Attributes (FIT)
Fit$coefficients
The observations and residuals of the fitting results are calculated using the function residuals.
Residuals (FIT)
Summary (FIT)
Prediction diagram of linear regression model
The following code draws an image of the fitted model,
Plot (FIT)
650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M00/9D/9E/wKiom1mCzZiA-ziLAADADjYL3o0915.jpg-wh_500x0-wm_ 3-wmp_4-s_3675732929.jpg "title=" Untitled 2.jpg "alt=" Wkiom1mczzia-zilaadadjyl3o0915.jpg-wh_50 "/>
3D image of fitting model
You can also draw a 3D image of the fitted model, using the function Scatterplot3d in the following code to create a 3D
Library (Scatterplot3d)
S3d <-Scatterplot3d (year, quarter, CPI, Highlight.3d=t, type= "H", lab=c (2,3))
S3d$plane3d (FIT)
650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M00/9D/9E/wKioL1mCzafCQ_GWAABUh48qPJE408.jpg-wh_500x0-wm_ 3-wmp_4-s_3401694949.jpg "title=" Untitled 3.jpg "alt=" Wkiol1mczafcq_gwaabuh48qpje408.jpg-wh_50 "/>
Fit model
Based on the fitting model, the CPI of 20L1 year can be predicted by the following way, the predicted value in the latter figure is expressed in the small triangle.
data2011 <-data.frame (year=2011, Quarter=1:4)
cpi2011 <-predict (Fit, newdata=data2011)
Style <-C (Rep (1,12), Rep (2,4))
Plot (c (CPI, cpi2011), xaxt= "n", ylab= "CPI", xlab= "", Pch=style, Col=style)
Axis (1, at=1:16, las=3,
Labels=c (Paste (year,quarter,sep= "Q"), "2011q1", "2011q2", "2011q3", "2011q4")
A forecast value of 2011 CPI based on linear regression model
650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M00/9D/9E/wKiom1mCzbiDS3w9AACbBuaPY9M841.jpg-wh_500x0-wm_ 3-wmp_4-s_1436126465.jpg "title=" Untitled 4.jpg "alt=" Wkiom1mczbids3w9aacbbuapy9m841.jpg-wh_50 "/>
This article from the "CAS Computer Training" blog, declined to reprint!
In R-language data mining, how does "regression analysis" work?