Linear model (1)--Multivariate linear regression

Last Update:2016-04-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Outline:

Basic forms of linear models
Loss function of multivariate linear regression
The parameters of multivariate linear regression by least squares method
The difference between the least squares and the random gradient descent
Questions
Learning and reference

1. Basic forms of linear models

The linear model is a simple, easy-to-model, and highly explanatory model, which is based on a linear combination of attributes, with the following basic forms:

Formula (1)

Converted to a vector form, written as follows:

Formula (2)

Why is it very explanatory, because the weight vector of the model is very intuitive to express the importance of each attribute in the prediction, for example, to predict whether it will rain today, and has been based on historical data to learn the model weight vector and intercept B, You can consider each attribute to determine if it will rain today:

Formula (3)

2. Loss function for multivariate linear regression

In the multivariate linear regression task, the mean square error is a common loss function, and the learning task is to solve the parameters of the model based on the minimization of the mean square error, the loss function is as follows:

Formula (4)

where M is the number of samples, Yi is the true value of the sample, and F (x) is the predicted value.

By merging The Intercept B in the formula (4) into W, the new weight vector is added one dimension, namely: W= (W;b) (all of the following W are this form), each of the corresponding sample Xi also adds one dimension to Xi= (x11,x12,x13 x1d,1)

The loss function can then be written in the following form:

Formula (5)

Where y is the marker vector for the sample, y= (y1,y2,y3 YM), x is the sample matrix.

3. The least square method to find the parameters of multivariate linear regression

In the task of learning the model, we should do is to let the predicted value as close as possible to the real value, do the least error, and the mean square error is one of the expression of this error, so we want to solve the multivariate linear regression model, is to solve the mean square error to minimize the corresponding parameters:

Formula (6)

Where w* is the corresponding solution of the model, even if the mean square error function is minimized, the weight vector.

So, how should we ask for w*? Here, we can use the least squares method to estimate the parameters of the model, the method is: the loss function to the need to solve the parameters of the derivation, and the derivative of 0, to obtain the corresponding parameters.

Here, we need the formula (5) to take the derivative, before the derivation, we take a look at two derivative formulas:

Formula (7)

Formula (8)

For a detailed derivation process (scribbled ~ ~ Do not mind)

After the loss function is used to derive the parameters, it can be obtained:

Formula (9)

An order (9) is zero:

Formula (10)

The above is a closed solution to the optimal solution of the parameter W, but we can find that the calculation of w* involves the inverse of the matrix, so there are some limitations, only when the x^t*x is a full-rank matrix or a positive definite matrix, the above formula can be used. However, in real-world tasks, x^t*x is often not a full-rank matrix, which leads to multiple solutions, and the multiple solutions can minimize the mean square error, but not all solutions are suitable for predicting tasks, because some solutions may produce overfitting problems.

4. Difference between the least squares and the random gradient descent

In the process of learning, I have thought about the difference between the two, at the beginning probably only know the following things:

The least square method is to minimize the mean square error, and when the x^t*x is a full rank matrix, the closed solution of the parameters can be directly obtained, and the random gradient descent needs to update the parameters continuously, and the solution is not necessarily the global optimal solution.

But the blog when the time to stroll around the https://www.zhihu.com/question/20822481, the user's answer to the summer morning I enlightened ...

5. Questions

Linear models can rely on weights to judge the importance of features, but how accurate is this judgment? The collinearity between features allows the features to share some information with each other, and how to determine the importance of a feature is not shared with other features to it?

6. Learning and references

Zhou Zhihua Teacher's "machine learning"

Linear model (1)--Multivariate linear regression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linear model (1)--Multivariate linear regression

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support