Element Linear Regression Analysis of One scatter plot

Source: Internet
Author: User

Yes whitenan... * _ *... Nan is a beautiful colleague of the Organization...
XY scatter chart in Excel. The mathematical and C ++ or C methods used.




The mathematical representation used in this article. Why can't I leave it alone? Some special symbols cannot be written or displayed. I had to add some symbols to the text description ..

Xoy represents the common XY axes in higher mathematics into a 90 ° Cartesian coordinate system. It seems that the professional statement is the Cartesian coordinate system.
Y indicates the Y value of the sample point.
Y' represents the concept of a regression equation, that is, the Y value obtained by bringing X points into the regression equation.
(Yi) '| (I = 1, 2,..., n) indicates a general expression of the regression value of the sample point.
The value range of sample points is from 1 to n.
Sum (EI) ^ 2) | (I = 1, 2,..., n), sum () represents the sum of the formulas in parentheses.
(EI) ^ 2 represents the concept of Square.
Y "and X" represent the y average and x average of sample points. Note that some articles use y and X to represent the average.
A/B Indicates the Division expression.




P.s linear regression equation (statistical concept) and linear equation of functions (concept of function, relationship between the X and Y coordinates of points on a straight line)
Is there an additional regression concept. Because the linear equation in the function concept. If any value in X and Y has a corresponding value
The concept of a linear regression equation can only be used to define a value in X and Y. Through this equation, a function value is derived from a function value, and the probability of a function value is
Value. This value is a range concept.


The XY scatter chart in Excel. The mathematical principle is the method of linear regression analysis.
One-dimensional linear regression analysis is the simplest model for dealing with the relationship between two variables. The object of study is the linear correlation between two variables. Map the variables X and Y to the point (XI, Yi) of xoy In the Cartesian coordinate system, respectively. The figure shows that the data point basically falls near a straight line. The variable X is displayed, and the variable Y basically shows a linear relationship. However, since not all vertices are completely on a straight line. Therefore, the relationship between x and y is not completely accurate to the function relationship that can be uniquely identified by an X value.


The relationship between x and y is called linear fitting. y' = ax + B (fun-1-1 ). note that in order to distinguish the common linear equation, Y is followed by an extra sign, which distinguishes y = ax + B. In y' = AX + B, A and B are undetermined coefficients and become regression coefficients. Theoretically, y' = ax + B has an infinite number of groups of solutions.

The purpose of regression analysis is to find the best linear fit.
In the geometric sense, the distance between these points and the regression line is the closest. If the (yi) '(I = 1, 2,..., calculated by using the regression equation y' = ax + B ,....., n), where N is the number of samples in the statistics. Is how many points
Fit the regression line. There is a deviation between the obtained (yi) 'value and the actual measured experiment value, which is the residual value. It is recorded as EI (I = 1, 2,..., n). In this way, the sum of squares of residual values can be used to measure the degree of closeness or deviation between the measured value and the regression line. The sum of squares of the residual values is defined as: Q (a, B) = sum (EI) ^ 2) | (I = 1, 2 ,....., n) in the formula, sum indicates sum, and ^ 2 indicates square,

The text description is used to calculate the sum of squares of all EI.
Because Ei = Yi-(yi) '; where (yi)' = bxi +.

The least square method is commonly used. The least square method is to select a and B to minimize Q (A, B. Since Q (A, B) is a, the quadratic function of B always has a minimum value.

The method for finding the Extreme Value in the differential. The minimum value of Q (A, B) should be satisfied. Partial Differential Q/Partial Differential a = 0 and partial differential Q/Partial Differential B = 0.
Find a group of solutions to this equation,
A = y "+ bx"
B = lxy/LXX.
Y is the average value of the sample points, and X is the average value of the sample points.
Lxy is the sum of the covariance of XY, and LXX is the sum of the square differences of X.
We can see that the regression straight line goes through points (Y ", X") and points (0, ).
From the mechanical point of view, point (Y ", X") is the center of gravity of N samples.

Here, lxy = sum (Xi-X ") * (Yi-y") | (I = 1, 2,..., n ),
The text description is the sum of the {XI and X "(average value of xi)} product {Yi and Y" (average value of Yi)} of all sample points.
LXX = sum (Xi-X ") ^ 2)
The text description is used to calculate the sum of squares of the {XI and X "(average of xi) Differences} of all sample points.

Here is an indicator describing the quality of this linear regression equation, R ^ 2. So-called correlation coefficient. The closer this value is to 1, the better. It means that all these points are near this straight line. When the value is 1, it is a complete straight line. The probability is 100%.

In Excel, B is obtained using the formula sum (Xi-X ") * (Yi-y")/sum (Xi-X ") ^ 2.
A = y "+ bx ".

// Y = ax + B; n is the sample point space .....
// B = L-XY/L-xx
// A = SumAll (y)/n + B * SumAll (X)/N;

Dconcent [0] = 5.0;
Dconcent [1] = 10.0;
Dconcent [2] = 20.0;
Dconcent [3] = 40.0;
Damps: [0] = 3.73;
Damps [1] = 10.708;
Damps [2] = 10.468;
Damps [3] = 19.101;

// Get the sum x & sum x * x
// X is concent .....
Double dsumconcent = 0.0;
Double dsumconcentsquart = 0.0;

For (INT I = 0; I <4; I ++)
Dsumconcent + = dconcent [I];
Dsumconcentsquart + = POW (dconcent [I], 2 );
} // _

// Get the sum Y & sum y * y
// Y is the APMs ........
Double dsumamps = 0.0;
Double dsumamps squart = 0.0;
Double dsumdoublec = 0.0;

For (INT I = 0; I <4; I ++)
Dsumamps + = damps [I];
Dsumampssquart + = POW (damps [I], 2 );

// Sum x * y
Dsumdoublec + = dconcent [I] * damps [I];
} // _

Double daverageamps = dsumamps/4.0;
Double daverageconcent = dsumconcent/4.0;
Double dvarbup = 0.0;
Double dvarbdown = 0.0;
Double dvarb = 0.0;
For (INT I = 0; I <4; I ++)
Dvarbup + = (dconcent [I]-daverageconcent) * (damps [I]-daverageamps ));
Dvarbdown + = POW (dconcent [I]-daverageconcent), 2 );
} // _
Dvarb = dvarbup/dvarbdown;

Double dvara = daverageamps-dvarb * daverageconcent;
Double drelpow2 = POW (dvarbup, 2 );
Double dxxpow2 = 0.0;
Double dyypow2 = 0.0;
For (INT I = 0; I <4; I ++)
Dxxpow2 + = POW (dconcent [I]-daverageconcent), 2 );
Dyypow2 + = POW (damps [I]-daverageamps), 2 );

} // _
Dxxpow2 * = dyypow2;
Drelpow2/= dxxpow2;


A = dvara;

B = dvarb;

R = drelpow2;


The above program in the XP-sp3 + vs2005 C ++ through and Excel results are the same.


Bty: when displayed, you can convert it to a string by using a floating point number ()
Convert to the number of digits after the decimal point, but the original value is used for exact calculation.


In addition, five sample points were used during the experiment, and a four sample points were found. I wanted to use the code for five sample points, and directly write the last X and Y as X = 0.0, y = 0.0. In terms of mathematics, the results are not affected, but the results are actually affected .. If you are interested, take a look.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.