Unary linear regression model and least squares method and its C + + implementation

Last Update:2015-07-12 Source: Internet

Author: User

Tags gety

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: http://blog.csdn.net/qll125596718/article/details/8248249

In supervised learning, if the predicted variable is discrete, we call it classification (e.g. decision tree, support vector machine, etc.), if the predicted variable is continuous, we call it regression. In regression analysis, if you include only one argument and one dependent variable, and the relationship between the two can be approximated by a straight line, this regression analysis is called unary linear regression analysis. If the regression analysis includes two or more two independent variables, and the dependent variable and the independent variable are linear, then the multivariate linear regression analysis is called. For two-dimensional space linear is a straight line, for three-dimensional space linear is a plane, for multidimensional space linear is a super plane ... Here, we talk about the simplest linear regression model of one element.

1. Unary linear regression model

The model is as follows:

The relationship between Y and X in the general regression function is linear and non-linear. There are two explanations for the linear regression model:

(1) is linear for the variable, and the conditional mean of Y is the linear function of X.

(2) is linear for the parameter, and the conditional mean of Y is the linear function of the parameter.

Linear regression model mainly refers to the parameter is "linear", because as long as the parameter is linear, you can use a similar method to estimate its parameters.

2. Parameter estimation--least squares

For a unary linear regression model, assume that n groups of observations (X1,y1), (X2,y2), ..., (Xn,yn) are obtained from the population. For these n points in a plane, you can use an infinite number of curves to fit. The sample regression function is required to fit this set of values as well as possible. Together, this line is the most reasonable in the central position of the sample data. The criteria for selecting the best fit curve can be determined as follows: The total fit error (i.e. total residuals) is minimized. The following three criteria can be selected:

(1) It is a way to determine the linear position with "residual and minimum". But it was soon found that the calculation of "residuals and" there was a problem of offsetting each other.
(2) It is also a way to determine the straight position with "absolute residuals and minimum". But the calculation of absolute value is more troublesome.
(3) The principle of least squares is to determine the linear position with "residual squared and minimum". In addition to the least-squares method, the obtained estimators have good properties. This method is very sensitive to outliers.

Common least squares (ordinary Least square,ols) are most commonly used: The selected regression model should minimize the sum of the residuals of all observations. (q is the sum of squares of residuals)

Sample regression Model:

Sum of squares of residuals:

Then, by using the Q-min to determine the straight line, which determines that the variable is considered as the function of Q, it becomes an extremum problem, which can be obtained by the derivative number. Ask Q for a partial derivative of two parameters to be evaluated:

Solution to:

3. Least squares C + + implementation

[CPP]View Plaincopy

#include <iostream>
#include <fstream>
#include <vector>
Using namespace std;
Class leastsquare{
double A, B;
Public
Leastsquare (const vector<double>& x, const vector<double>& y)
{
double t1=0, t2=0, t3=0, t4=0;
For (int i=0; i<x.size (); ++i)
{
T1 + = X[i]*x[i];
T2 + = X[i];
T3 + = X[i]*y[i];
T4 + = Y[i];
}
A = (T3*x.size ()-T2*T4)/(T1*x.size ()-T2*T2);
//b = (t4-a*t2)/x.size ();
b = (T1*T4-T2*T3)/(T1*x.size ()-T2*T2);
}
Double GetY (const double x) const
{
return a*x + b;
}
void print () const
{
cout<<"y =" <<a<<"x +" <<b<<"\ n";
}
};
int main (int argc, char *argv[])
{
if (argc! = 2)
{
cout<<"Usage:DataFile.txt" <<endl;
return-1;
}
Else
{
vector<double> x;
Ifstream in (argv[1]);
For (double D; in>>d;)
X.push_back (d);
int sz = x.size ();
vector<double> y (x.begin () +SZ/2, X.end ());
X.resize (SZ/2);
Leastsquare ls (x, y);
Ls.print ();
cout<<"Input x:\n";
double x0;
While (cin>>x0)
{
cout<<"y =" <<ls.gety (x0) <<endl;
cout<<"Input x:\n";
}
}
}

Unary linear regression model and least squares method and its C + + implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More