First, the theory
second, the data set
6.1101,17.5925.5277,9.13028.5186,13.6627.0032,11.8545.8598,6.82338.3829,11.8867.4764,4.34838.5781, A6.4862,6.59875.0546,3.81665.7107,3.252214.164,15.5055.734,3.15518.4084,7.22585.6407,0.716185.3794,3.51296.3654,5.30485.1301,0.560776.4296,3.65187.0708,5.38936.1891,3.138620.27,21.7675.4901,4.2636.3261,5.18755.5649,3.082518.945,22.63812.828,13.50110.957,7.046713.176,14.69222.203,24.1475.2524,-1.226.5894,5.99669.2482,12.1345.8918,1.84958.2111,6.54267.9334,4.56238.0959,4.11645.6063,3.392812.836,10.1176.3534,5.49745.4069,0.556576.8825,3.911511.708,5.38545.7737,2.44067.8247,6.73187.0931,1.04635.0702,5.13375.8014,1.84411.7,8.00435.5416,1.01797.5402,6.75045.3077,1.83967.4239,4.28857.6031,4.99816.3328,1.42336.3589,-1.42116.2742,2.47565.6397,4.60429.3102,3.96249.4536,5.41418.8254,5.16945.1793,-0.7427921.279,17.92914.908,12.05418.959,17.0547.2182,4.88528.2951,5.744210.236,7.77545.4994,1.017320.341,20.99210.136,6.67997.3345,4.02596.0062,1.27847.2259,3.34115.0269,-2.68076.5479,0.296787.5386,3.88455.0365,5.701410.274,6.75265.1077,2.05765.7292,0.479535.1884,0.204216.3557,0.678619.7687,7.54356.5159,5.34368.5172,4.24159.1802,6.79816.002,0.926955.5204,0.1525.0594,2.82145.7077,1.84517.6366,4.29595.8707,7.20295.3054,1.98698.2934,0.1445413.394,9.05515.4369,0.61705
Third, the code implementation
Clear all; Clc;data = Load (' ex1data1.txt '); X = Data (:, 1); y = data (:, 2); m = length (y); % Number of training Examplesplot (x, y, ' Rx '), percent =================== part 3:gradient descent ===================fprintf (' Running Gradient descent ... \ n ')% why add a column 1, in order to calculate J time, theta0 times 1X = [Ones (M, 1), Data (:, 1)]; % Add a column of ones to Xtheta = Zeros (2, 1); % Initialize fitting parameters% Some gradient descent settingsiterations = 1500;alpha = 0.01;% Compute and display Initia L costcomputecost (x, y, theta)% run gradient Descent[theta, j_history]= gradientdescent (x, Y, theta, alpha, iterations); Ho LD on; % Keep previous plot visibleplot (X (:, 2), X*theta, '-') Legend (' Training data ', ' Linear regression ') hold off% don ' t overlay Any more plots on the figure% Predict values for population sizes of 35,000 and 70,000predict1 = [1, 3.5] *theta;fprintf (' for population = 35,000, we predict a profit of%f\n ',... predict1*10000);p redict2 = [1, 7] * theta;fprintf (' for Popu Lation = 70,000, we predict a profitof%f\n ',... predict2*10000);% Grid over which we'll calculate jtheta0_vals = Linspace ( -10, ten, +); theta1_vals = Li Nspace ( -1, 4, +);% initialize j_vals to a matrix of 0 ' sj_vals = zeros (Length (theta0_vals), Length (theta1_vals));% Fill O UT j_valsfor i = 1:length (theta0_vals) for J = 1:length (theta1_vals) t = [Theta0_vals (i); Theta1_vals (j)]; J_vals (i,j) = Computecost (X, y, T); endend% Because of the meshgrids work in the surf command, we need to% transpose j_vals before calling surf, or else The axes would be flippedj_vals = J_vals ';% Surface Plotfigure;surf (theta0_vals, Theta1_vals, j_vals) xlabel (' \theta_0 '); Ylabel (' \theta_1 ');% Contour plotfigure;% Plot j_vals as contours spaced logarithmically between 0.01 and 100% base 10 index Logspace (-2, 3, 20) coordinate values range and spacing contour (theta0_vals, Theta1_vals, J_vals, Logspace ( -2, 3,)) Xlabel (' \theta_0 '); Ylabel (' \theta_1 '); hold On;plot (Theta (1), Theta (2), ' Rx ', ' markersize ', ten, ' LineWidth ', 2);
...................
function J = Computecost (X, y, theta) m = length (y); % Number of training examples j = 0;for i=1:m j = j + (theta (1) *x (i,1) + theta (2) *x (i,2)-y (i)) ^2; end% divided by 2m is to update the parameters of the time to Count 2 because J is two times, to cheat to the post-generation coefficient 2,%m is to not let J too large (i=1:m is already the second part of the deviation of M, item) j = j/(m*2); end
......
function [Theta, j_history] = Gradientdescent (X, y, theta, Alpha, num_iters) m = Length (y); % Number of training examplesj_history = Zeros (num_iters, 1); J_1 = 0;% partial derivative j_1, j_2j_2 = 0;for iter = 1:num_iters for i = 1:m j_1 = j_1 + theta (1) *x (i,1) + theta (2) *x (i,2)-Y ( i); J_2 = J_2 + (theta (1) *x (i,1) + theta (2) *x (i,2)-y (i)) * X (i,2); the M in end%j is not removed in the above for, because it is enough to divide only once j_1 = j_1/m; j_2 = j_2/m;% Temp1 = theta (1)-alpha * j_1;% temp2 = Theta (2)-alpha * j_2;% theta (1) = temp1;% theta (2) = Temp2; Theta (1) = theta (1)-alpha * j_1; Theta (2) = Theta (2)-alpha * j_2; J_history (ITER) = Computecost (X, y, theta); % Save J_history j_historyendend
iv. Results of Operation
Gradient Descent optimized linear regression