Linear regression--least squares method (I.)

Last Update:2016-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I believe that we have learned the linear regression of mathematical statistics (linear regression), this article will explain the univariate linear regression and write out the use of least squares method (least squares) In order to find the complete process of the optimal solution of the linear regression loss function, the least square method is deduced first, and then the linear regression is fitted with the least squares algorithm for a simple data set.

Linear regression

Linear regression assumes that there is a linear relationship between the characteristics of the data set and the result;

Equation: y = mx + c

　　Y is the result, X is characteristic, M is the coefficient, C is the error in mathematics m is gradient c is intercept

This equation for us assumes that we need to find M, c so that the results obtained by MX+C are the least accurate with the true Y error, where the squared difference is used to measure the estimated value and the true worth of errors (if only the difference is possible with negative numbers); The function used to calculate the error of the real and the predicted values is called: The square loss function (Squard loss functions), where L is used to represent the loss function, so there are:

The average loss on the entire data set is:

We require the most matched m and C to make l the smallest;
A mathematical expression can be expressed as:

　　The least square method is used to find the optimal value of the objective function, and it is called as the least squares algorithm by minimizing the squared error and finding the match. The least squares is used to obtain the optimal solution of the linear regression.

Least squares

In order to make it easier to explain the least squares derivation process used here, the dataset has 1 ... n data composition, each data consists of, composition, X represents the feature, Y is the result; here the linear regression model is defined as:

The average loss function is defined as:

Requirements of the smallest, its about C and M of the partial derivative is set to 0, so the partial derivative, the derivative is equal to 0, and the C and M solution will be able to get the smallest l at this time C and M is the most matching the model;

About C Derivatives:

Because the derivation is about the partial derivative of C, it removes the term "l" from the equation that does not contain C:

The arrangement of the formula does not include the subscript n of the accumulation and the outer movement is obtained:

The partial derivative of C is obtained:

About the partial derivative of M:

For the partial derivative of M, so that the L equation does not contain the items removed:

The arrangement of the formula does not include the subscript n of the accumulation and the outer movement is obtained:

A partial derivative of M is obtained:

Make the partial derivative of C equal to 0, solve:

It can be seen from the above solution that there are two mean values in the above equation, so it can be rewritten as well:

Make the partial derivative about m equal to 0, solve:
The partial derivative of M depends on C, and because the solution to the C derivative is obtained, the partial derivative of the solution algebra about the C partial derivative is derived as follows:

Merging the items containing M:

Solving:

To simplify the equation, define the following:

Example:

The following data sets are linear-fitted using the least squares formula obtained above:

N	x	y	XY	x^2
1	2	4	8	4
2	6	8	48	36
3	9	12	108	81
4	13	21st	273	169
Average	7.5	11.25	109.25	72.5

Distribution of data points:

Calculates the optimal current data set based on the appellate least squares formula: M and C

c = 11.25-1.5307 * 7.5 = 0.23

Finally, we conclude that the current linear function is:

y = 1.5307x-0.23

Calculate the predicted values for each node:

Y1 = 1.5307 * 2-0.23 = 2.83
y2 = 1.5307 * 6-0.23 = 8.9542
Y3 = 1.5307 * 9-0.23 = 13.5463
Y4 = 1.5307 * 13-0.23 = 19.6691

To fit the result:

Resources:
Https://zh.wikipedia.org/zh/%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98%E6%B3%95
A first course in machine learning

Article starting address: Solinx
http://www.solinx.co/archives/648

Linear regression--least squares method (I.)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linear regression--least squares method (I.)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linear regression--least squares method (I.)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support