machine_learning_cs229 linear regression Linear regression (2)

Source: Internet
Author: User

This blog aims to discuss the learning rate of linear regression gradient decline, which andrewng in the public class, and discusses the problem of gradient descent initial value with an example.

The learning rate in linear regression gradient descent

In the previous blog, we deduced the linear regression and used the gradient descent to solve the parameters in the linear regression. But we did not take into account the problem of learning rate.

We still follow the previous understanding of the linear regression image: You stand on the top of the mountain, look around, looking for a small step in the fastest direction of the mountain, and then look around again looking for a one-step downhill in the fastest direction, after many iterations will go to the lowest point. So in this understanding, what is the rate of learning actually? The learning rate is how long you walk.

So too much learning rate may cause you to step across too big, directly across the minimum mean we want; too small a learning rate will cause you to stride too small, you may walk a lot of steps, in fact, you have a great distance from the target point.

The adjustment of learning rate is the key of our gradient descent algorithm.

I see in the related books of Neural network, 1996 Hayjin proves that, as long as the learning rate α satisfies the following formula, LMS algorithm will converge. (P.S. I have not read the relevant papers for the time being, so I can only give a conclusion temporarily)

, which is the maximum eigenvalue of the autocorrelation matrix R composed of input vector x (n). Because it is often unknown, it is often indicated by traces (trace) of the autocorrelation matrix R.

, and TR (R) is the sum of the mean square values of each input vector.

We now have at least the maximum value of the learning rate α, which guarantees a gradient descent convergence.

I'm writing a program of my own. By using batch gradient descent, random gradient descent is used to test the learning rate and the initial value of gradient descent .

I took advantage of mathematical algorithms for Linear Regression, Academic Press, 1991, page 304,isbn 0-12-656460-4. A set of data.

This set of data includes systolic blood pressure for 30 people of different ages, including 4 rows per group of data

I, the index;
A0, 1,
A1, the age;
B, the systolic blood pressure.

where x is the age and Y indicates the corresponding systolic pressure.

At the same time we obtain the learning rate

Next, I'll show you the results of the three methods that were fitted out, including:

Red lines indicate batch gradient descent results

The Green line represents the result of a random gradient descent

The Blue line represents the result of the direct calculation of the parameter

The first set of test data is in the case of initial value learning rate, iterative 10,000 times to get

It can be seen that the batch gradient descent is basically consistent with the directly obtained parameters, and the decline of the batch gradient basically converges to the minimum of MSE, but the result of the random gradient decline is poor.

The second set of test data is the initial value of the learning rate, the iteration 10,000 times to get

It can be seen that the batch gradient drop is not fully convergent, the random gradient descent is basically completely convergent, and the first set of test data obtained by the same value.

The second set of test data is the initial value of the learning rate, the iteration 10,000 times to get

This set of test data is very large because of the high learning rate and θ no longer converges.

As a result, we can see the advantages and disadvantages of batch gradient descent and random gradient descent.

The batch gradient drops, the advantage: the obtained parameter is very accurate, is not too easy to fall into the local minimum value;

Disadvantage: Slow convergence speed

Random gradient descent, advantages: fast convergence rate

Cons: The resulting parameters are not very accurate and are prone to local minimum values.

With the code (writing Matlab Less, and finally basically forced to write C ... )

%data
X (:, 1) = 1;
X (:, 2) =a (:, 1);
Y=a (:, 2);
B=figure;
Set (b, ' name ', ' sample image ');
Plot (X (:, 2), Y, ' * ');
Axis ([10,70,100,230]);
% to calculate the sum of the mean square values of each input vector.
mm=0;
For i=1:30
Mm=x (i,1) ^2+x (i,2) ^2;
End
mm=2/(mm);
% Batch gradient descent
mse=100;
m=0.1;
theta=[100,1];
alpha=0.0001;
times=0;
While Mse>m && times<10000
times=times+1;
tot1=0;
tot2=0;
mse=0;
For i=1:30
tot1=tot1+ (Y (i)-(theta (1) *x (i,1) +theta (2) *x (i,2))) *x (i,1);
tot2=tot2+ (Y (i)-(theta (1) *x (i,1) +theta (2) *x (i,2))) *x (i,2);
Mse=mse+ (Y (i)-(theta (1) *x (i,1) +theta (2) *x (i,2))) ^2/2;
End
Theta (1) =theta (1) +alpha*tot1/30*2;
Theta (2) =theta (2) +alpha*tot2/30*2;
MSE=MSE/30;
End
Hold on;
Y=theta (1) +theta (2) *x;
Plot (x, y, ' Color ', [1,0,0]);
% Random gradient descent
X (:, 1) = 1;
X (:, 2) =a (:, 1);
Y=a (:, 2);
mse=100;
m=0.1;
theta=[100,1];
alpha=0.0001;
times=0;
While Mse>m && times<10000
times=times+1;
tot1=0;
tot2=0;
mse=0;
For i=1:30
tot1=0;
tot2=0;
tot1=tot1+ (Y (i)-(theta (1) *x (i,1) +theta (2) *x (i,2))) *x (i,1);
tot2=tot2+ (Y (i)-(theta (1) *x (i,1) +theta (2) *x (i,2))) *x (i,2);
Theta (1) =theta (1) +alpha*tot1*2;
Theta (2) =theta (2) +alpha*tot2*2;
End
For i=1:30
Mse=mse+ (Y (i)-(theta (1) *x (i,1) +theta (2) *x (i,2))) ^2/2;
End
MSE=MSE/30;
End
Hold on;
Y=theta (1) +theta (2) *x;
Plot (x, y, ' Color ', [0,1,0]);
% Formula method for Theta
%data
X (:, 1) = 1;
X (:, 2) =a (:, 1);
Y=a (:, 2);
THETA0=INV (x ' *x) *x ' *y;
Hold on;
Y=THETA0 (1) +theta0 (2) *x;
Plot (x, y, ' Color ', [0,0,1]);

machine_learning_cs229 linear regression Linear regression (2)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.