Introduction to Linear regression
As shown, if the arguments (also called independent variable) and the dependent variable (also called dependent variable) are drawn on two-dimensional coordinates, each record corresponds to a point. The most common application scenario for linear back-regulation is to use a straight line to fit a known point and predict its Y value for a given x value. And all we have to do is find a suitable curve, which is to find the right slope and the longitudinal moment.
SSE & RMSE
SSE in refers to sum of squared error, that is, the sum of squares of the difference between the predicted value and the actual value, which can be used to judge the error of the model. However, there are some drawbacks in using SSE to characterize the model, for example, it depends on the number of points, and it is not good to set their units. So we have another value to weigh the error of the model. RMSE (Root-mean-square Error).
It is normalized by N, and its units are the same as the variable units.
Case
Many studies have shown that the global average temperature has risen in the past few decades, resulting in rising sea levels and extreme weather that can affect countless people. The case in this paper attempts to study the relationship between global average temperature and some other factors.
The data climate_change.csv used herein can be downloaded by the reader.
Https://courses.edx.org/c4x/MITx/15.071x_2/asset/climate_change.csv
This dataset contains data from May 1983 to December 2008.
In this example, we use data from May 1983 to December 2006 as a training data set, followed by data as a test data set.
Data
Load Data First
Data interpretation
Year years M
Month months T
The difference between the global average temperature and a reference value in the current period of the EMP
CO2, n2o,ch4,cfc.11,cfc.12: Atmospheric concentrations of these gases aerosols
Model selection
The linear regression model retains two parts.
Select the target feature. There are multiple feature in our data, but not all feature are helpful for predictions, or not all feature need to work together to make predictions, so we need to sift through the smallest feature combinations that best predict close to the facts.
Determine the feature coefficient (coefficient). After the feature is selected, we want to determine the weight of each feature on the predicted result, which is coefficient
Select Model with instance
Initial selection of all feature
Select all feature as the first model1 and use the summary function to calculate its adjusted R2 to 0.7371.
Remove feature individually
Remove any feature in the Model1, and note the corresponding adjusted R2 as follows
Feature |
Adjusted R2 |
CO2 + CH4 + N2O + cfc.11 + cfc.12 + TSI + aerosols |
0.6373 |
MEI + CH4 + N2O + cfc.11 + cfc.12 + TSI + aerosols |
0.7331 |
MEI + CO2 + N2O + cfc.11 + cfc.12 + TSI + aerosols |
0.738 |
MEI + CO2 + CH4 + cfc.11 + cfc.12 + TSI + aerosols |
0.7339 |
MEI + CO2 + CH4 + N2O + cfc.12 + TSI + aerosols |
0.7163 |
MEI + CO2 + CH4 + N2O + cfc.11 + TSI + aerosols |
0.7172 |
MEI + CO2 + CH4 + N2O + cfc.11 + cfc.12 + aerosols |
0.697 |
MEI + CO2 + CH4 + N2O + cfc.11 + cfc.12 + TSI |
0.6883 |
This round gets temp ~ MEI + CO2 + N2O + cfc.11 + cfc.12 + TSI + aerosols
Remove the 1 feature from the MODEL2 and note the corresponding adjusted R2 as follows
Feature |
Adjusted R2 |
CO2 + N2O + cfc.11 + cfc.12 + TSI + aerosols |
0.6377 |
MEI + N2O + cfc.11 + cfc.12 + TSI + aerosols |
0.7339 |
MEI + CO2 + cfc.11 + cfc.12 + TSI + aerosols |
0.7346 |
MEI + CO2 + N2O + cfc.12 + TSI + aerosols |
0.7171 |
MEI + CO2 + N2O + cfc.11 + TSI + aerosols |
0.7166 |
MEI + CO2 + N2O + cfc.11 + cfc.12 + aerosols |
0.698 |
MEI + CO2 + N2O + cfc.11 + cfc.12 + TSI |
0.6891 |
Any combination of adjusted R2 is smaller than the previous round, so select the previous round of the feature combination as the final model, i.e. temp ~ MEI + CO2 + N2O + cfc.11 + cfc.12 + TSI + aerosols
The coefficient of each feature can be calculated by summary (MODEL2) as follows.
Introduction to Linear regression
In linear regression, data is modeled using a linear predictive function, and unknown model parameters are estimated by data. These models are called linear models. The most commonly used linear regression modeling is the affine function of x, given the X-value of Y's conditional mean.
Linear regression is the first type of regression analysis that has been rigorously studied and widely used in practical applications. This is because a model that relies linearly on its unknown parameters is easier to fit than a nonlinear model that relies on its positional parameters, and the resulting estimated statistical characteristics are easier to determine.
The above definition comes from Wikipedia.
This error estimation function is to go to the sum of the estimates of X (i) and the squared sum of the true value Y (i) and as the error estimation function, the 1/2m in front of the multiplication is for the derivation of the time, the coefficient is gone. As for why Squared is chosen as the error estimation function, it has to be explained from the angle of probability distribution.
How to adjust θ so that J (θ) obtains the minimum value there are many methods, this article will focus on the gradient descent method and the normal equation method.
Gradient Descent
After the linear regression model is selected, the model can be used for prediction only if the parameter θ is determined. However, Theta needs to make J (θ) the smallest. So the problem boils down to the problem of finding the minimum value.
The gradient descent process is as follows:
1. First assign a value to θ, which can be random, or allow θ to be a full 0 vector.
2. Change the value of θ so that J (θ) is adjusted in the direction of gradient descent.
The gradient direction is determined by the partial derivative of θ from J (θ), which is the inverse direction of the partial derivative, because the minimum value is obtained. Update the formula to:
This method requires that all training data be evaluated for errors before the θ is updated. (α for learning speed)
Normal equation (normal equation)
From an R language case study linear regression