Reprint: Original address is http://www.cnblogs.com/elaron/archive/2013/05/20/3088894.html
Elar
Many Thanks
----------------------------------------------------------------------------------------------------
The origin of Normal equations
Suppose we have a sample of M. The dimension of the eigenvector is N. Therefore, the sample is {(X (1), Y (1)), (x (2), Y (2)),... ..., (x (m), Y (M))}, where x (i) is x (i) ={x1 (i), xn (i),... ..., xn (i)} for each sample. Make H (θ) =θ0 +θ1x1 +θ2x2 + ... +θnxn, there is
Less a Theta0, need to add, is N+1 dimension.x is also m+1 dimension
If you want H (θ) =y, there is
T θ= Y
Let's recall the two concepts: the inverse of the unit matrix and the matrix, to see what their nature is.
(1) Unit matrix E
Ae=ea=a
(2) Inverse A-1 of matrices
Requirement: A must be a phalanx
Nature: Aa-1=a-1a=e
Take a look at the formula x. θ= Y
If you want to find theta, then we need to do some conversion:
Step1: First turn the Matrix on the left of θ into a square. By multiplying the XT can be achieved, there is
XTX θ= xty
Step2: Turn the left part of Theta into a unit matrix so that it disappears into the invisible ...
(XTX)-1 (XTX) · θ= (XTX) -1xty
Step3: Because (XTX)-1 (XTX) = E, the equation becomes
eθ= (XTX) -1xty
E can be removed, thus getting
θ= (XTX) -1xty
This is what we call the normal equation.
Normal equation VS Gradient descent
Normal equation, like the Gradient descent (gradient descent), can be used to calculate the weight vector θ. But compared with gradient descent, it has both advantages and disadvantages.
Advantage:
Normal equation can not be the scale of the meaning X feature. For example, there are eigenvectors x={x1, X2}, where the range of X1 is 1~2000, and X2 's range is 1~4, you can see that their range is 500 times times the difference. If the gradient descent method is used, it can cause the ellipse to become very narrow and long, but the gradient descent is difficult, even the gradient can not be lowered (because the derivative may be rushed out of the ellipse after multiplying the step). However, if you use the normal equation method, you do not have to worry about this problem. Because it is purely a matrix algorithm.
Disadvantage:
Compared to the gradient descent,normal equation, a large number of matrix operations are needed, especially the inverse of the matrix. In the case of a large matrix, the computational complexity and the requirements for the memory capacity of the computer are greatly increased.
Under what circumstances will the normal equation be present, and how should it be addressed?
(1) When the dimension of the eigenvector is too large (e.g., M <= N)
Workaround: ① Use regularization method
Or②delete Some of the feature dimensions
(2) Redundant features (also known as linearly dependent feature)
For example, x1= size in Feet2
x2 = size in m2
The conversion of feet and M to 1m≈3.28feet so, x1≈3.282 * x2, so x1 and X2 are linearly related (it can be said that there is a redundancy between X1 and x2)
Workaround: Find redundant feature dimensions and delete them.
Practice
On the introductory page of the exercise, see Ng's openclassroom exercise:multivariate Linear Regression
Download the data on the page and load it into MATLAB.
Y (i) represents the price, X (i) indicates the size of the house and the number of rooms:
Number of samples m=47.
Step1: Preprocessing the data
For each x vector, add a x0=1 component.
m = 47;
X=[ones (m,1), ex3x];
To view the X matrix:
Step2: Bring into the normal equation formula θ= (XTX) -1xtyto solve the weight vector.
y=ex3y; theta = INV (x ' *x) *x ' *y;
The θ vector is calculated as
If I want to expect the price of "1650-square-foot house with 3 bedrooms" then the x *θ= y will know:
Price = [1,1650,3]* theta;
We cancel the scientific notation in MATLAB and see how Price is priced:
>> Format Long g
>> Price
Price = 293081.464334897
In the sample we gave, we looked for a close sample:
The size of the 23rd sample House is 1604, and the number of rooms is 3, and its price is
We can try to draw an image of the H (θ) function to see:
The Min and Max functions were first used to find the maximum and minimum values of the House area (x1) and the number of rooms (x2),
X1∈[852,4478]
x2∈[1,5]
X1=linspace (852,4478,47); X2=linspace (1,5,47);
[Xx1,xx2]=meshgrid (X1,X2);
H_theta = Theta (1) *ones (47,47) + theta (2) *xx1 + theta (3) *xx2;
Surf (Xx1,xx2,h_theta);
You can see that H (θ) is the following plane:
Category: Recommender System
Reprint: Normal equation certificate and its application