One-dimensional linear regression model and Least Square Method and Its C ++ implementation

Source: Internet
Author: User

In supervised learning, if the predicted variables are discrete, we call them classification (such as decision trees and SVM). If the predicted variables are continuous, we call them regression. In regression analysis, if only one independent variable and one dependent variable are included, and the relationship between the two can be expressed in a straight line, this regression analysis is called a one-dimensional linear regression analysis. If the regression analysis includes two or more independent variables, and the dependent variables and independent variables are linearly related, it is called multivariate linear regression analysis. For two-dimensional space linearity is a straight line; for three-dimensional space linearity is a plane, for multi-dimensional space linearity is a superplane... here, we will talk about the simplest one-dimensional linear regression model.

1. One-dimensional Linear Regression Model

The model is as follows:

In the overall regression function, the relationship between Y and X is linear and non-linear. There are two interpretations of the linear regression model:

(1) The variable is linear, and the conditional Mean of Y is a linear function of X.

(2) The parameter is linear, and the conditional Mean of Y is a linear function of the parameter.

Linear regression models refer to "linear" for parameters. As long as the parameters are linear, they can be estimated using a similar method.

2. Parameter Estimation-Least Square Method

For a linear regression model, assume that N groups of observed values (x1, Y1), (X2, Y2),… are obtained from the population ),..., (Xn, yn ). For the N points in the plane, countless curves can be used for fitting. The sample regression function is required to fit this set of values as well as possible. In summary, this line is the most reasonable place in the center of the sample data. The criteria for selecting the best fitting curve can be determined to minimize the total fitting error (that is, the total residual. You can select the following three criteria:

(1) Using "residual and least" to determine the position of a straight line is a way. However, it was soon discovered that there was a problem of mutual offset between the "residual and" calculation.
(2) It is also a way to determine the position of a straight line using "absolute residual value and minimum residual value. However, it is troublesome to calculate the absolute value.
(3) The principle of the least square method is to determine the linear position by "the sum of the residual squares is the smallest. In addition to the convenience of calculation, the least square method also has excellent characteristics. This method is very sensitive to abnormal values.

Ordinary Least Square, OLS: the regression model selected should minimize the sum of the residual squares of all observed values. (Q is the sum of squares of the residual values)

Sample regression model:

Sum of the residual values:

Then, the straight line is determined by the minimum Q value, that is, the variable is regarded as a function of Q, and it becomes a problem of extreme values. It can be obtained by the derivative. Evaluate the partial derivative of the two parameters to be evaluated by Q:

Solution:

3. Least Squares C ++ implementation

#include<iostream>#include<fstream>#include<vector>using namespace std;class LeastSquare{double a, b;public:LeastSquare(const vector<double>& x, const vector<double>& y){double t1=0, t2=0, t3=0, t4=0;for(int i=0; i<x.size(); ++i){t1 += x[i]*x[i];t2 += x[i];t3 += x[i]*y[i];t4 += y[i];}a = (t3*x.size() - t2*t4) / (t1*x.size() - t2*t2);//b = (t4 - a*t2) / x.size();b = (t1*t4 - t2*t3) / (t1*x.size() - t2*t2);}double getY(const double x) const{return a*x + b;}void print() const{cout<<"y = "<<a<<"x + "<<b<<"\n";}};int main(int argc, char *argv[]){if(argc != 2){cout<<"Usage: DataFile.txt"<<endl;return -1;}else{vector<double> x;ifstream in(argv[1]);for(double d; in>>d; )x.push_back(d);int sz = x.size();vector<double> y(x.begin()+sz/2, x.end());x.resize(sz/2);LeastSquare ls(x, y);ls.print();cout<<"Input x:\n";double x0;while(cin>>x0){cout<<"y = "<<ls.getY(x0)<<endl;cout<<"Input x:\n";}}}

(Sina Weibo: @ quanliang _ machine learning)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.