Linear model (1)--Multivariate linear regression

Source: Internet
Author: User

Outline:

    1. Basic forms of linear models
    2. Loss function of multivariate linear regression
    3. The parameters of multivariate linear regression by least squares method
    4. The difference between the least squares and the random gradient descent
    5. Questions
    6. Learning and reference

1. Basic forms of linear models

The linear model is a simple, easy-to-model, and highly explanatory model, which is based on a linear combination of attributes, with the following basic forms:

Formula (1)

Converted to a vector form, written as follows:

Formula (2)

Why is it very explanatory, because the weight vector of the model is very intuitive to express the importance of each attribute in the prediction, for example, to predict whether it will rain today, and has been based on historical data to learn the model weight vector and intercept B, You can consider each attribute to determine if it will rain today:

Formula (3)

2. Loss function for multivariate linear regression

In the multivariate linear regression task, the mean square error is a common loss function, and the learning task is to solve the parameters of the model based on the minimization of the mean square error, the loss function is as follows:

Formula (4)

where M is the number of samples, Yi is the true value of the sample, and F (x) is the predicted value.

By merging The Intercept B in the formula (4) into W, the new weight vector is added one dimension, namely: W= (W;b) (all of the following W are this form), each of the corresponding sample Xi also adds one dimension to Xi= (x11,x12,x13 x1d,1)

The loss function can then be written in the following form:

Formula (5)

Where y is the marker vector for the sample, y= (y1,y2,y3 YM), x is the sample matrix.

3. The least square method to find the parameters of multivariate linear regression

In the task of learning the model, we should do is to let the predicted value as close as possible to the real value, do the least error, and the mean square error is one of the expression of this error, so we want to solve the multivariate linear regression model, is to solve the mean square error to minimize the corresponding parameters:

Formula (6)

Where w* is the corresponding solution of the model, even if the mean square error function is minimized, the weight vector.

So, how should we ask for w*? Here, we can use the least squares method to estimate the parameters of the model, the method is: the loss function to the need to solve the parameters of the derivation, and the derivative of 0, to obtain the corresponding parameters.

Here, we need the formula (5) to take the derivative, before the derivation, we take a look at two derivative formulas:

Formula (7)

Formula (8)

For a detailed derivation process (scribbled ~ ~ Do not mind)

After the loss function is used to derive the parameters, it can be obtained:

Formula (9)

An order (9) is zero:

Formula (10)

The above is a closed solution to the optimal solution of the parameter W, but we can find that the calculation of w* involves the inverse of the matrix, so there are some limitations, only when the x^t*x is a full-rank matrix or a positive definite matrix, the above formula can be used. However, in real-world tasks, x^t*x is often not a full-rank matrix, which leads to multiple solutions, and the multiple solutions can minimize the mean square error, but not all solutions are suitable for predicting tasks, because some solutions may produce overfitting problems.

4. Difference between the least squares and the random gradient descent

In the process of learning, I have thought about the difference between the two, at the beginning probably only know the following things:

The least square method is to minimize the mean square error, and when the x^t*x is a full rank matrix, the closed solution of the parameters can be directly obtained, and the random gradient descent needs to update the parameters continuously, and the solution is not necessarily the global optimal solution.

But the blog when the time to stroll around the https://www.zhihu.com/question/20822481, the user's answer to the summer morning I enlightened ...

5. Questions

Linear models can rely on weights to judge the importance of features, but how accurate is this judgment? The collinearity between features allows the features to share some information with each other, and how to determine the importance of a feature is not shared with other features to it?

6. Learning and references

Zhou Zhihua Teacher's "machine learning"

Linear model (1)--Multivariate linear regression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.