In R-language data mining, how does "regression analysis" work?

Source: Internet
Author: User

Regression analysis is the establishment of a function to predict the dependent variable (also known as the value of the response variable) for multiple independent variables (also known as predictor variables).


For example, the bank assesses the mortgage risk of the applicant based on factors such as age, income, expenditure, occupation, burden on the population, and overall credit limit.


Linear regression


Linear regression is a linear combination function of predictor variables, which is used to predict the statistical analysis method of response variables, the linear regression model has the following form:


y = C0 + c1x1 + c2x2 + ... + ckxk;


x1, x2,... xk as the Predictor variable, and y is the response variable for the prediction.


The following will use the function LM for linear regression analysis on the Australian Consumer Price Index (CPI) data


The data is the quarterly consumer Price index for Australia from 2008 to 2010.


1. You need to create datasets and plot scatter plots. In the following code, using the function axis to manually add a horizontal axis, the parameter las=3 sets the text to the vertical direction.


Year <-Rep (2008:2010, each=4)

Quarter <-Rep (1:4, 3)

CPI <-C (162.2, 164.6, 166.5, 166.0,

166.2, 167.0, 168.6, 169.5,

171.0, 172.1, 173.3, 174.0)

Plot (CPI, Xaxt= "n", ylab= "CPI", xlab= "")

# Draw X-axis

Axis (1, Labels=paste (year,quarter,sep= "Q"), At=1:12, las=3)


Australia 2008-2010 Quarterly Consumer price index

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/9D/9E/wKioL1mCzYjBQHvwAACHL5LaJ8k488.jpg-wh_500x0-wm_ 3-wmp_4-s_2957187919.jpg "title=" Untitled 1.jpg "alt=" Wkiol1mczyjbqhvwaachl5laj8k488.jpg-wh_50 "/>


2. View the correlation coefficients between CPI and other variables, including the year and quarter (quarter) variables

Cor (YEAR,CPI)

Cor (QUARTER,CPI)


3, using the function LM on the previous data to establish a linear regression model, where year and quarter are predictor variables, CPI is the response variable.


Fit <-LM (CPI ~ Year + quarter)

Fit


Based on the linear model established above, the CPI is calculated as:


CPI = C0 + C1 * year + C2 * Quarter

Among them, C0, C1, C2 are the coefficients of fit model fitting. Therefore, the CPI value for 2011 can be calculated as follows.


(cpi2011 <-fit$coefficients[[1]] + fit$coefficients[[2]]*2011 + fit$coefficients[[3]]* (1:4))


More details of the model can be obtained through the following code.


Attributes (FIT)

Fit$coefficients


The observations and residuals of the fitting results are calculated using the function residuals.


Residuals (FIT)

Summary (FIT)


Prediction diagram of linear regression model


The following code draws an image of the fitted model,

Plot (FIT)


650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M00/9D/9E/wKiom1mCzZiA-ziLAADADjYL3o0915.jpg-wh_500x0-wm_ 3-wmp_4-s_3675732929.jpg "title=" Untitled 2.jpg "alt=" Wkiom1mczzia-zilaadadjyl3o0915.jpg-wh_50 "/>


3D image of fitting model


You can also draw a 3D image of the fitted model, using the function Scatterplot3d in the following code to create a 3D


Library (Scatterplot3d)

S3d <-Scatterplot3d (year, quarter, CPI, Highlight.3d=t, type= "H", lab=c (2,3))

S3d$plane3d (FIT)


650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M00/9D/9E/wKioL1mCzafCQ_GWAABUh48qPJE408.jpg-wh_500x0-wm_ 3-wmp_4-s_3401694949.jpg "title=" Untitled 3.jpg "alt=" Wkiol1mczafcq_gwaabuh48qpje408.jpg-wh_50 "/>


Fit model


Based on the fitting model, the CPI of 20L1 year can be predicted by the following way, the predicted value in the latter figure is expressed in the small triangle.


data2011 <-data.frame (year=2011, Quarter=1:4)

cpi2011 <-predict (Fit, newdata=data2011)

Style <-C (Rep (1,12), Rep (2,4))

Plot (c (CPI, cpi2011), xaxt= "n", ylab= "CPI", xlab= "", Pch=style, Col=style)

Axis (1, at=1:16, las=3,

Labels=c (Paste (year,quarter,sep= "Q"), "2011q1", "2011q2", "2011q3", "2011q4")


A forecast value of 2011 CPI based on linear regression model


650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M00/9D/9E/wKiom1mCzbiDS3w9AACbBuaPY9M841.jpg-wh_500x0-wm_ 3-wmp_4-s_1436126465.jpg "title=" Untitled 4.jpg "alt=" Wkiom1mczbids3w9aacbbuapy9m841.jpg-wh_50 "/>


This article from the "CAS Computer Training" blog, declined to reprint!

In R-language data mining, how does "regression analysis" work?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.