Download Training Set data first Ex2data.zip , there are 50 training samples, X is 50 children's age, age is 2 to 8 years old, Y is the height of the corresponding child, age and height can be expressed as a decimal form, the current demand is based on the sample data of these 50 children to predict the height of children 3.5 and 7 years old.
below, we first draw the 50 children sample data scatter chart, using the tool for MATLAB.
First step: Load Data
x = Load (' Ex2x.dat '); y = Load (' Ex2y.dat ');
Step two: Draw a scatter plot.
Figure%open A new figure windowplot (x, y, ' o '), Ylabel (' Height in Meters ') xlabel (' Age in Years ')
Through this, we can intuitively discover that these data can be processed using a linear regression model. We are here to solve the problem by using the normal equations and the gradient descent two ways respectively.
method One: normal equation solving
In linear programming, gradient descent, and normal equations--Stanford ml public lesson Note 1-2 , we have talked about the solution of the normal equation group. The Final Solution gets:
MATLAB is implemented as follows:
Percent method One x = Load (' Ex2x.dat '); % load data y = Load (' ex2y.dat ');p lot (x, y, ' o ') % draw scatter plot xlabel (' height ') % axis meaning label ylabel (' age ') x = [Ones (length (x), 1), X ]; % ones (length (x), 1) with 1 fill x column vector of the first column length (50*1)% x = [Ones (length (x), 1), X] is added on the original x based on a column all 1 column vector (50*2) w=inv (x ' *x) *x ' *y % This is our normal equation set formula, inv () is inverse, X ' is x transpose hold on% still drawing plot on the scatter chart canvas above (x (:, 2), 0.0639*x (:, 2) +0.7502) %w=inv (x ' *x) *x ' * Y solves the W0 and w1% w0=0.7502,w1=0.0639, so the linear equation is 0.0639*x (:, 2) +0.7502, where X (:, 2) represents the second column of data in X, the age data in the original training data set
method Two: Gradient Descent method
we need to set the number of iterations (1500 times) and the learning rate (alpha=0.07) and calculate the gradient solution based on the above two iterations. The specific programming implementation is as follows:
Clear all; Close all; CLCX = Load (' Ex2x.dat '); y = Load (' Ex2y.dat '); m = length (y); % number of training examples% Plot the training datafigure; % open a new figure windowplot (x, y, ' o '); Ylabel (' Height ') xlabel (' age ')% Gradient descentx = [Ones (M, 1) x]; % Add a column of ones to Xtheta = zeros (Size (x (1,:))) ',% initialize fitting parameters% with 0 fill x first row data size vector (1*2), then transpose to column vector (2*1 ) Max_itr = 1500; % maximum number of iterations alpha = 0.07; % the size of the learning rate for num_iterations = 1:max_itr% This is the gradient grad = (1/m). * x ' * ((x * theta)-y); The actual update theta = theta-alpha. * grad;end% Print theta to screentheta% Plot the linear fithold on; % Keep previous plot visibleplot (X (:, 2), X*theta, '-') Legend (' Training data ', ' Linear regression ')% mark the meanings represented by each curve marker in the image, The meaning of the legend hold off% don ' t overlay a more plots on the this figure, referring to the front of the picture% Closed form solution for reference% you'll learn About this method in the future Videosexact_theta = (x ' * x) \x ' * y% Predict values for age 3.5 and 7predict1 =[1, 3.5] *thetapredict2 = [1, 7] * theta% Calculate J matrix loss function% Grid over which we'll Calculate jtheta0_vals = Linspac E (-3, 3, 100);% evenly generates 3 values in intervals of 3 to 100 theta1_vals = Linspace ( -1, 1, +);% initialize j_vals to a matrix of 0 ' sj_vals = zeros (l Ength (theta0_vals), Length (theta1_vals)); For i = 1:length (theta0_vals) for j = 1:length (theta1_vals) t = [theta0 _vals (i); Theta1_vals (j)]; J_vals (i,j) = (0.5/m). * (x * t-y) ' * (x * t-y); endend% Because of the meshgrids work in the surf command, we need to% transpose j_vals before calling surf, or else The axes would be flippedj_vals = J_vals ';% Surface Plotfigure;surf (theta0_vals, Theta1_vals, j_vals) xlabel (' \theta_0 '); Ylabel (' \theta_1 ');% Contour plotfigure;% Plot j_vals as contours spaced logarithmically between 0.01 and 100contour (th Eta0_vals, Theta1_vals, J_vals, Logspace (-2, 2, 15))% Draw Contour Xlabel (' \theta_0 '); Ylabel (' \theta_1 ');% is similar to an escape character, but can only be a parameter 0~9
The above three graphs represent the data fitting line graph, the image of loss function J, and the contour map of J respectively.
By calculating the results, we know that the gradient descent algorithm is exactly the same as the normal equations, which is very interesting.
Reference: http://www.cnblogs.com/tornadomeet/archive/2013/03/15/2961660.html
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex2/ Ex2.html
Linear regression Exercises