Reprint: Normal equation certificate and its application

Source: Internet
Author: User

Reprint: Original address is http://www.cnblogs.com/elaron/archive/2013/05/20/3088894.html

Elar

Many Thanks

----------------------------------------------------------------------------------------------------

The origin of Normal equations

Suppose we have a sample of M. The dimension of the eigenvector is N. Therefore, the sample is {(X (1), Y (1)), (x (2), Y (2)),... ..., (x (m), Y (M))}, where x (i) is x (i) ={x1 (i), xn (i),... ..., xn (i)} for each sample. Make H (θ) =θ0 +θ1x1 +θ2x2 + ... +θnxn, there is

Less a Theta0, need to add, is N+1 dimension.x is also m+1 dimension

If you want H (θ) =y, there is

T θ= Y

Let's recall the two concepts: the inverse of the unit matrix and the matrix, to see what their nature is.

(1) Unit matrix E

Ae=ea=a

(2) Inverse A-1 of matrices

Requirement: A must be a phalanx

Nature: Aa-1=a-1a=e

Take a look at the formula x. θ= Y

If you want to find theta, then we need to do some conversion:

Step1: First turn the Matrix on the left of θ into a square. By multiplying the XT can be achieved, there is

XTX θ= xty

Step2: Turn the left part of Theta into a unit matrix so that it disappears into the invisible ...

(XTX)-1 (XTX) · θ= (XTX) -1xty

Step3: Because (XTX)-1 (XTX) = E, the equation becomes

eθ= (XTX) -1xty

E can be removed, thus getting

θ= (XTX) -1xty

This is what we call the normal equation.

Normal equation VS Gradient descent

Normal equation, like the Gradient descent (gradient descent), can be used to calculate the weight vector θ. But compared with gradient descent, it has both advantages and disadvantages.

Advantage:

Normal equation can not be the scale of the meaning X feature. For example, there are eigenvectors x={x1, X2}, where the range of X1 is 1~2000, and X2 's range is 1~4, you can see that their range is 500 times times the difference. If the gradient descent method is used, it can cause the ellipse to become very narrow and long, but the gradient descent is difficult, even the gradient can not be lowered (because the derivative may be rushed out of the ellipse after multiplying the step). However, if you use the normal equation method, you do not have to worry about this problem. Because it is purely a matrix algorithm.

Disadvantage:

Compared to the gradient descent,normal equation, a large number of matrix operations are needed, especially the inverse of the matrix. In the case of a large matrix, the computational complexity and the requirements for the memory capacity of the computer are greatly increased.

Under what circumstances will the normal equation be present, and how should it be addressed?

(1) When the dimension of the eigenvector is too large (e.g., M <= N)

Workaround: ① Use regularization method

Or②delete Some of the feature dimensions

(2) Redundant features (also known as linearly dependent feature)

For example, x1= size in Feet2

x2 = size in m2

The conversion of feet and M to 1m≈3.28feet so, x1≈3.282 * x2, so x1 and X2 are linearly related (it can be said that there is a redundancy between X1 and x2)

Workaround: Find redundant feature dimensions and delete them.

Practice

On the introductory page of the exercise, see Ng's openclassroom exercise:multivariate Linear Regression

Download the data on the page and load it into MATLAB.

Y (i) represents the price, X (i) indicates the size of the house and the number of rooms:

Number of samples m=47.

Step1: Preprocessing the data

For each x vector, add a x0=1 component.

m = 47;
X=[ones (m,1), ex3x];

To view the X matrix:

Step2: Bring into the normal equation formula θ= (XTX) -1xtyto solve the weight vector.

y=ex3y; theta = INV (x ' *x) *x ' *y;

The θ vector is calculated as

If I want to expect the price of "1650-square-foot house with 3 bedrooms" then the x *θ= y will know:

Price = [1,1650,3]* theta;

We cancel the scientific notation in MATLAB and see how Price is priced:

>> Format Long g
>> Price

Price = 293081.464334897

In the sample we gave, we looked for a close sample:

The size of the 23rd sample House is 1604, and the number of rooms is 3, and its price is

We can try to draw an image of the H (θ) function to see:

The Min and Max functions were first used to find the maximum and minimum values of the House area (x1) and the number of rooms (x2),

X1∈[852,4478]

x2∈[1,5]

X1=linspace (852,4478,47); X2=linspace (1,5,47);
[Xx1,xx2]=meshgrid (X1,X2);
H_theta = Theta (1) *ones (47,47) + theta (2) *xx1 + theta (3) *xx2;
Surf (Xx1,xx2,h_theta);

You can see that H (θ) is the following plane:

Category: Recommender System

Reprint: Normal equation certificate and its application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.