Data
Data download: Ex3data.zip. Students who have seen Stanford's Wunda Teacher's machine learning program know a data set on house size and house price, the data set we use this time is the 47 sample data, X represents the size of the house and the number of bedrooms, and y represents the price of the house. Our goal is to predict the house size of 1650, and the bedroom has 3 house prices.
Data preprocessing
The data is added to the X0 entry, which initializes the data to 1
x = [Ones (M, 1), X];
The difference of the size of the data has great influence on the gradient descent algorithm, and we should deal with the data in a map. The method is to first seek the standard deviation and the average value, then the data minus the mean and then divided by the standard deviation.
Sigma = STD (x);% seek standard deviation mu = mean (x); % averaging x (:, 2) = (X (:, 2)-mu (2))./Sigma (2); % of the second column of house area processed X (:, 3) = (X (:, 3)-mu (3))./Sigma (3); % of third-column house bedroom number to be processed
Gradient Descent
The relevant formula for gradient descent can be referenced in the previous blog post, which is not repeated here.
the choice of learning rate
The choice of learning rate is generally
The change of function value of loss function is observed by running gradient descent method to adjust the learning rate so as to find the best learning rate.
theta = Zeros (Size (x (1,:))) ';% initialize fitting Parametersalpha = percent of Your initial learning rate%%j = zeros (1); for Num_iterations = 1:50 J (num_iterations) = percent Calculate your cost function here percent theta = percent of Result of gradient D Escent Update%%end% Now plot j% technically, the first J-starts at the Zero-eth iteration% but Matlab/octave doesn ' t has A zero Indexfigure;plot (0:49, J (1:50), '-') xlabel (' Number of iterations ') ylabel (' Cost J ')
If you choose a learning rate within a better range, you should see a graph like the one shown.
If you get a graph that is very different from it, you should try to adjust your learning rate.
Of course, you can also use normal equations to solve the problem.
Below we give the main program code:
Method One: Gradient descent method x = Load (' Ex3x.dat '); y = load (' Ex3y.dat '); x = [Ones (Size (x,1), 1) X];meanx = mean (x);% mean Sigmax = STD (x);% for standard deviation X (:, 2) = (X (:, 2)-meanx (2))./sigmax (2); X (:, 3) = (X (:, 3)-meanx (3))./sigmax (3); figureitera_num = 100; The number of iterations of the% attempt sample_num = size (x,1); % of training samples alpha = [0.01, 0.03, 0.1, 0.3, 1, 1.3];% because almost every 3 times times the learning rate is chosen to test, so directly enumerate out Plotstyle = {' B ', ' R ', ' G ', ' k ', ' b--', ' r--'} ; theta_grad_descent = zeros (Size (x (1,:))) for alpha_i = 1:length (alpha)% try to see which learning rate is best theta = zeros ( Size (x,2), 1); The initial value of%theta is assigned to 0 Jtheta = Zeros (itera_num, 1); for i = 1:itera_num% calculates the parameter after the number of iterations itera_num for a learning rate alpha &NB Sp Jtheta (i) = (1/(2*sample_num)). * (x*theta-y) ' * (x*theta-y);%jtheta is a line vector & nbsp Grad = (1/sample_num). *x ' * (x*theta-y); theta = Theta-alpha (alpha_i) .*grad; end plot (0:49, Jtheta ( 1:50), char (Plotstyle (alpha_i)), ' LineWidth ', 2)% be sure to convert by Char function hold on if (1 = = Alpha (alpha_i))% The experiment found that Alpha was1 o'clock the effect is best, then the theta value after the iteration is the desired value theta_grad_descent = theta endend legend (' 0.01 ', ' 0.03 ', ' 0.1 ', ' 0.3 ', ' 1 ', ' 1.3 '); Xlabel (' Number of iterations ') ylabel (' cost function ')% below is the predictive formula Price_grad_descend = Theta_grad_descent ' *[1 (1650-meanx (2))/sigmax (2) (3-meanx (3)/sigmax (3))] ' %% method two: normal EQUATIONSX = Load (' Ex3x.dat '); y = load (' Ex3y.dat '); x = [Ones (Size (x,1), 1) x];theta_norequ = INV ((x ' *x)) *x ' *yprice_norequ = theta_norequ ' *[ 1 1650 3] '
The curves obtained by gradient descent method in different learning rates are as follows:
The graph shows that when the learning rate is 1 o'clock, the convergence is the fastest, so we choose the learning rate of 1, the parameters obtained and the results of the prediction data are as follows
when we use the normal equations to solve the problem, the parameters and the predicted result data are as follows:
The calculation result is slightly error, but the difference is not big, basically 290,000 points.
Reference article:
Http://www.cnblogs.com/tornadomeet/archive/2013/03/15/2962116.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex3/ Ex3.html
Multiple linear regression Exercises