Unary linear regression model and least squares method and its C + + implementation

Source: Internet
Author: User
Tags gety

Original: http://blog.csdn.net/qll125596718/article/details/8248249

In supervised learning, if the predicted variable is discrete, we call it classification (e.g. decision tree, support vector machine, etc.), if the predicted variable is continuous, we call it regression. In regression analysis, if you include only one argument and one dependent variable, and the relationship between the two can be approximated by a straight line, this regression analysis is called unary linear regression analysis. If the regression analysis includes two or more two independent variables, and the dependent variable and the independent variable are linear, then the multivariate linear regression analysis is called. For two-dimensional space linear is a straight line, for three-dimensional space linear is a plane, for multidimensional space linear is a super plane ... Here, we talk about the simplest linear regression model of one element.

1. Unary linear regression model

The model is as follows:

The relationship between Y and X in the general regression function is linear and non-linear. There are two explanations for the linear regression model:

(1) is linear for the variable, and the conditional mean of Y is the linear function of X.

(2) is linear for the parameter, and the conditional mean of Y is the linear function of the parameter.

Linear regression model mainly refers to the parameter is "linear", because as long as the parameter is linear, you can use a similar method to estimate its parameters.

2. Parameter estimation--least squares

For a unary linear regression model, assume that n groups of observations (X1,y1), (X2,y2), ..., (Xn,yn) are obtained from the population. For these n points in a plane, you can use an infinite number of curves to fit. The sample regression function is required to fit this set of values as well as possible. Together, this line is the most reasonable in the central position of the sample data. The criteria for selecting the best fit curve can be determined as follows: The total fit error (i.e. total residuals) is minimized. The following three criteria can be selected:

(1) It is a way to determine the linear position with "residual and minimum". But it was soon found that the calculation of "residuals and" there was a problem of offsetting each other.
(2) It is also a way to determine the straight position with "absolute residuals and minimum". But the calculation of absolute value is more troublesome.
(3) The principle of least squares is to determine the linear position with "residual squared and minimum". In addition to the least-squares method, the obtained estimators have good properties. This method is very sensitive to outliers.

Common least squares (ordinary Least square,ols) are most commonly used: The selected regression model should minimize the sum of the residuals of all observations. (q is the sum of squares of residuals)

Sample regression Model:

Sum of squares of residuals:

Then, by using the Q-min to determine the straight line, which determines that the variable is considered as the function of Q, it becomes an extremum problem, which can be obtained by the derivative number. Ask Q for a partial derivative of two parameters to be evaluated:

Solution to:

3. Least squares C + + implementation

[CPP]View Plaincopy
    1. #include <iostream>
    2. #include <fstream>
    3. #include <vector>
    4. Using namespace std;
    5. Class leastsquare{
    6. double A, B;
    7. Public
    8. Leastsquare (const vector<double>& x, const vector<double>& y)
    9. {
    10. double t1=0, t2=0, t3=0, t4=0;
    11. For (int i=0; i<x.size (); ++i)
    12. {
    13. T1 + = X[i]*x[i];
    14. T2 + = X[i];
    15. T3 + = X[i]*y[i];
    16. T4 + = Y[i];
    17. }
    18. A = (T3*x.size ()-T2*T4)/(T1*x.size ()-T2*T2);
    19. //b = (t4-a*t2)/x.size ();
    20. b = (T1*T4-T2*T3)/(T1*x.size ()-T2*T2);
    21. }
    22. Double GetY (const double x) const
    23. {
    24. return a*x + b;
    25. }
    26. void print () const
    27. {
    28. cout<<"y =" <<a<<"x +" <<b<<"\ n";
    29. }
    30. };
    31. int main (int argc, char *argv[])
    32. {
    33. if (argc! = 2)
    34. {
    35. cout<<"Usage:DataFile.txt" <<endl;
    36. return-1;
    37. }
    38. Else
    39. {
    40. vector<double> x;
    41. Ifstream in (argv[1]);
    42. For (double D; in>>d;)
    43. X.push_back (d);
    44. int sz = x.size ();
    45. vector<double> y (x.begin () +SZ/2, X.end ());
    46. X.resize (SZ/2);
    47. Leastsquare ls (x, y);
    48. Ls.print ();
    49. cout<<"Input x:\n";
    50. double x0;
    51. While (cin>>x0)
    52. {
    53. cout<<"y =" <<ls.gety (x0) <<endl;
    54. cout<<"Input x:\n";
    55. }
    56. }
    57. }

Unary linear regression model and least squares method and its C + + implementation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.