Original: http://blog.csdn.net/qll125596718/article/details/8248249
In supervised learning, if the predicted variable is discrete, we call it classification (e.g. decision tree, support vector machine, etc.), if the predicted variable is continuous, we call it regression. In regression analysis, if you include only one argument and one dependent variable, and the relationship between the two can be approximated by a straight line, this regression analysis is called unary linear regression analysis. If the regression analysis includes two or more two independent variables, and the dependent variable and the independent variable are linear, then the multivariate linear regression analysis is called. For two-dimensional space linear is a straight line, for three-dimensional space linear is a plane, for multidimensional space linear is a super plane ... Here, we talk about the simplest linear regression model of one element.
1. Unary linear regression model
The model is as follows:
The relationship between Y and X in the general regression function is linear and non-linear. There are two explanations for the linear regression model:
(1) is linear for the variable, and the conditional mean of Y is the linear function of X.
(2) is linear for the parameter, and the conditional mean of Y is the linear function of the parameter.
Linear regression model mainly refers to the parameter is "linear", because as long as the parameter is linear, you can use a similar method to estimate its parameters.
2. Parameter estimation--least squares
For a unary linear regression model, assume that n groups of observations (X1,y1), (X2,y2), ..., (Xn,yn) are obtained from the population. For these n points in a plane, you can use an infinite number of curves to fit. The sample regression function is required to fit this set of values as well as possible. Together, this line is the most reasonable in the central position of the sample data. The criteria for selecting the best fit curve can be determined as follows: The total fit error (i.e. total residuals) is minimized. The following three criteria can be selected:
(1) It is a way to determine the linear position with "residual and minimum". But it was soon found that the calculation of "residuals and" there was a problem of offsetting each other.
(2) It is also a way to determine the straight position with "absolute residuals and minimum". But the calculation of absolute value is more troublesome.
(3) The principle of least squares is to determine the linear position with "residual squared and minimum". In addition to the least-squares method, the obtained estimators have good properties. This method is very sensitive to outliers.
Common least squares (ordinary Least square,ols) are most commonly used: The selected regression model should minimize the sum of the residuals of all observations. (q is the sum of squares of residuals)
Sample regression Model:
Sum of squares of residuals:
Then, by using the Q-min to determine the straight line, which determines that the variable is considered as the function of Q, it becomes an extremum problem, which can be obtained by the derivative number. Ask Q for a partial derivative of two parameters to be evaluated:
Solution to:
3. Least squares C + + implementation
[CPP]View Plaincopy
- #include <iostream>
- #include <fstream>
- #include <vector>
- Using namespace std;
- Class leastsquare{
- double A, B;
- Public
- Leastsquare (const vector<double>& x, const vector<double>& y)
- {
- double t1=0, t2=0, t3=0, t4=0;
- For (int i=0; i<x.size (); ++i)
- {
- T1 + = X[i]*x[i];
- T2 + = X[i];
- T3 + = X[i]*y[i];
- T4 + = Y[i];
- }
- A = (T3*x.size ()-T2*T4)/(T1*x.size ()-T2*T2);
- //b = (t4-a*t2)/x.size ();
- b = (T1*T4-T2*T3)/(T1*x.size ()-T2*T2);
- }
- Double GetY (const double x) const
- {
- return a*x + b;
- }
- void print () const
- {
- cout<<"y =" <<a<<"x +" <<b<<"\ n";
- }
- };
- int main (int argc, char *argv[])
- {
- if (argc! = 2)
- {
- cout<<"Usage:DataFile.txt" <<endl;
- return-1;
- }
- Else
- {
- vector<double> x;
- Ifstream in (argv[1]);
- For (double D; in>>d;)
- X.push_back (d);
- int sz = x.size ();
- vector<double> y (x.begin () +SZ/2, X.end ());
- X.resize (SZ/2);
- Leastsquare ls (x, y);
- Ls.print ();
- cout<<"Input x:\n";
- double x0;
- While (cin>>x0)
- {
- cout<<"y =" <<ls.gety (x0) <<endl;
- cout<<"Input x:\n";
- }
- }
- }
Unary linear regression model and least squares method and its C + + implementation