One-dimensional linear regression model and Least Square Method and Its C ++ implementation

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In supervised learning, if the predicted variables are discrete, we call them classification (such as decision trees and SVM). If the predicted variables are continuous, we call them regression. In regression analysis, if only one independent variable and one dependent variable are included, and the relationship between the two can be expressed in a straight line, this regression analysis is called a one-dimensional linear regression analysis. If the regression analysis includes two or more independent variables, and the dependent variables and independent variables are linearly related, it is called multivariate linear regression analysis. For two-dimensional space linearity is a straight line; for three-dimensional space linearity is a plane, for multi-dimensional space linearity is a superplane... here, we will talk about the simplest one-dimensional linear regression model.

1. One-dimensional Linear Regression Model

The model is as follows:

In the overall regression function, the relationship between Y and X is linear and non-linear. There are two interpretations of the linear regression model:

(1) The variable is linear, and the conditional Mean of Y is a linear function of X.

(2) The parameter is linear, and the conditional Mean of Y is a linear function of the parameter.

Linear regression models refer to "linear" for parameters. As long as the parameters are linear, they can be estimated using a similar method.

2. Parameter Estimation-Least Square Method

For a linear regression model, assume that N groups of observed values (x1, Y1), (X2, Y2),… are obtained from the population ),..., (Xn, yn ). For the N points in the plane, countless curves can be used for fitting. The sample regression function is required to fit this set of values as well as possible. In summary, this line is the most reasonable place in the center of the sample data. The criteria for selecting the best fitting curve can be determined to minimize the total fitting error (that is, the total residual. You can select the following three criteria:

(1) Using "residual and least" to determine the position of a straight line is a way. However, it was soon discovered that there was a problem of mutual offset between the "residual and" calculation.
(2) It is also a way to determine the position of a straight line using "absolute residual value and minimum residual value. However, it is troublesome to calculate the absolute value.
(3) The principle of the least square method is to determine the linear position by "the sum of the residual squares is the smallest. In addition to the convenience of calculation, the least square method also has excellent characteristics. This method is very sensitive to abnormal values.

Ordinary Least Square, OLS: the regression model selected should minimize the sum of the residual squares of all observed values. (Q is the sum of squares of the residual values)

Sample regression model:

Sum of the residual values:

Then, the straight line is determined by the minimum Q value, that is, the variable is regarded as a function of Q, and it becomes a problem of extreme values. It can be obtained by the derivative. Evaluate the partial derivative of the two parameters to be evaluated by Q:

Solution:

3. Least Squares C ++ implementation

#include<iostream>#include<fstream>#include<vector>using namespace std;class LeastSquare{double a, b;public:LeastSquare(const vector<double>& x, const vector<double>& y){double t1=0, t2=0, t3=0, t4=0;for(int i=0; i<x.size(); ++i){t1 += x[i]*x[i];t2 += x[i];t3 += x[i]*y[i];t4 += y[i];}a = (t3*x.size() - t2*t4) / (t1*x.size() - t2*t2);//b = (t4 - a*t2) / x.size();b = (t1*t4 - t2*t3) / (t1*x.size() - t2*t2);}double getY(const double x) const{return a*x + b;}void print() const{cout<<"y = "<<a<<"x + "<<b<<"\n";}};int main(int argc, char *argv[]){if(argc != 2){cout<<"Usage: DataFile.txt"<<endl;return -1;}else{vector<double> x;ifstream in(argv[1]);for(double d; in>>d; )x.push_back(d);int sz = x.size();vector<double> y(x.begin()+sz/2, x.end());x.resize(sz/2);LeastSquare ls(x, y);ls.print();cout<<"Input x:\n";double x0;while(cin>>x0){cout<<"y = "<<ls.getY(x0)<<endl;cout<<"Input x:\n";}}}

(Sina Weibo: @ quanliang _ machine learning)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

One-dimensional linear regression model and Least Square Method and Its C ++ implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

One-dimensional linear regression model and Least Square Method and Its C ++ implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support