Now machine learning algorithms in classification, regression, data mining and other issues on the use of a very broad, for beginners, may be heard ' algorithm ' or other exclusive nouns feel inscrutable, so many people are deterred, which makes many people in dealing with a lot of problems lost a very useful tool. Machine learning algorithm is not so advanced, here I use the most popular language to explain the meaning of the expression of the algorithm, and many people to the implementation of this part of the program will be deterred, there is a lot of out-of-the-box program, but in view of the majority of no comments, so sometimes it takes a lot of energy to read the program, Sometimes not even its solution, here I will also explain the algorithm of each of the procedures to explain, most of the way to explain, it is necessary to do the most fine, the context of the program to express clearly, so that the learning machine learning algorithm readers will certainly be more effective!
When reprinted, it is best to mark http://www.cnblogs.com/happylion/or http://blog.sina.com.cn/ahappylion
Start, learn, come on!
...................................................................... Split the line ... ... ... ... ... ... ... ... ... ... ... .... ... .... ... .... ... .... .... ... .... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
The previous blog has already said that we want the main content of linear regression, the popular saying is: You have a sample x=[x1,x2,..., xn], then you need to do is to find a set of parameters W=[w1,w2...wn], so that
The linear overlay and w1*x1+w2*x2+...+wn*xn of each element of the sample are equal to the label of the sample as much as possible. So our cost function is:
That is, our goal is to punish those samples that are linearly superimposed and not equal to the label. Then we minimize this cost function, and when the cost function is convergent, the parameter is the nibble we need. We have two ways to optimize our parameters, and the previous blog says that our linear regression parameters have explicit solutions. This is the normal EQUATIONS,W=INV (X ' *x) *x ' *y mentioned in the previous section. (Each row of x is a sample), in addition, we can also use gradient descent method to obtain our parameters, gradient descent method of interpretation will be mentioned in the following blog, here we use an example to illustrate:
The title is: 50 Data sample points, of which X is the age of the 50 children, ages 2 to 8 years old, the age can be presented in decimal form. Y for the 50 children of the corresponding height, of course, is also expressed in decimal form. The problem now is to estimate the height of children aged 3.5 and 7, based on these 50 training samples. (Data download)
The normal equations method is used to solve:
1 percent-percent method 12 x = Load (' Ex2x.dat '); 3 y = load (' Ex2y.dat '); 4 plot (x, y, ' * ') 5 xlabel (' height ') 6 ylabel (' age ') 7 x = [Ones (Size (x,2), 1), x];% because size (×) will come out x this vector two dimensions
8 degrees, we just need the first dimension, and we're going to add another column 1 because it turns the wx+b into a linear equation like W ' x so we're homogeneous, so we're going to expand the X to a column of 1. 9 W=INV (x ' *x) *x ' *y% this is the formula for the solution of ten hold On12 plot (x (:, 2), 0.0639*x (:, 2) +0.7502)% Here The 0.7502 is the first value of the W vector to be obtained, that is, the B of Wx+b, The second value of W is Wx+b's W.
Iterative solution coefficients using gradient descent method
Method Two:
1 Clear all; Close all; CLC 2 x = Load (' Ex2x.dat '); y = Load (' Ex2y.dat '); 3 m = length (y); % Number of training Examples 4 Plot the training data 5 figure; % open a new figure window can also not be written, nothing affects 6 plot (x, y, ' o ');% uses a circle to represent the data point 7 ylabel (' Height in meters ')% to the Y value of what it means 8 XL Abel (' Age in years ') Ten% Gradient descent11 x = [Ones (M, 1) x]; % add a column of ones to X x is added to the beginning of 1, that is, each data point adds one dimension, and this dimension is 1,12% equivalent to the requirement that the linear equation is homogeneous w ' x=y,x is the two-dimensional, y represents the Y value predicted by the trained w ' x theta = Z Eros (Size (x (1,:))) ';% initialize fitting parameters W ' initialized to [0;0]14 Max_itr = 1500;15 Alpha = 0.07;% learning rate for Num_iteratio NS = 1:max_itr18 grad = (1/m). * x ' * ((x * theta)-y);%grd specific is how to calculate the following deduction can be seen, but here the 1/m do not know how to get out, 19 of my is 2m, note that Grad is a Vector of 2*1. And the formula inside the Form 20 is a little different from here, because in the formula XI represents a vector, where x is a matrix, and each row represents a sample, so here the code is preceded by X ' x,21% in the formula is exactly the opposite. * is the dot multiplication, not the inner product, the result of the inner product of the vector is the number, This is still a vector of theta = theta-alpha. * GRAD; % here if the grad=0 to obtain the parameters of the method is the previous method, here is not the grad=0, but the iteration over and over again, to find the most value. End24 hold on; % keep previous plot visible25 plot (x (:,2), X*theta, '-')% this is the regression curve of that figure legend (' Training data ', ' Linear regression ')% of the meaning of each curve marker in the image, that is, the circle or segment represented by each data point represents% The meaning of the hold off% don ' t overlay any more plots on the this figure, refers to turn off the front of the picture of the Closed form solution for reference29% you'll Learn about this method in the future videos30 Exact_theta = (x ' * x) \x ' * y% don't know what it means.% Predict values for age 3.5 and 732 Predict1 = [1, 3.5] *theta33 predict2 = [1, 7] * Theta34% Grid over which we'll calculate J35 theta0_vals = Linspace (-3 , 3, 100);% generates a vector with a uniform 100 elements from 3 to 3 theta1_vals = Linspace ( -1, 1,); Notoginseng% initialize j_vals to a matrix of 0 ' s38 J_val s = zeros (length (theta0_vals), Length (theta1_vals)), and all-in-i = 1:length (theta0_vals)-j = 1:length (theta1_vals) t = [Theta0_vals (i); Theta1_vals (j)]; J_vals (i,j) = (0.5/m). * (x * t-y) ' * (x * t-y);% when the value of the parameter is a uniform sampling value from ( -3,1) to (3,1) 43 of the rectangle (takes 100*100 parameters), all samples The error of the regression equation of Xi with each parameter corresponding to 44 is the value of J_vals (i,j) end46 end47 j_vals = J_vals ';% Surface plot49 Figure;50 Surf (theta0_vals, theta1_vals, j_vals)% draws an image of the parameter and loss function. Pay attention to use this surf compare egg ache, surf (x, y, z) is such, Wuyi%x,y is a vector, Z is a matrix, with X, Y paved grid (100*100 point) and Z of each point 52 to form a graph, but how to correspond to where, the egg hurts is, The second element of your x with the first element of Y is not the same as the value of Z (2,1)!! 53% but corresponds to Z (+)!! Because Z (2,1) is formed before, the second element of x is the first element of Y with 54% so the J_vals front is transpose. Xlabel (' \theta_0 '); Ylabel (' \theta_1 '); Contour plot57 figure;58% Plot j_vals as contours spaced logarithmically between 0.01 and 1005 9 Contour (theta0_vals, Theta1_vals, J_vals, Logspace (-2, 2, 15))% draw the contour line of Xlabel (' \theta_0 '); Ylabel (' \theta_1 ');% is similar to an escape character, but can only be a parameter 0~9
Experimental results: Training sample scatter and regression curve prediction diagram:
The surface graph between the loss function and the parameter:
Reference: http://www.cnblogs.com/tornadomeet/archive/2013/03/15/2961660.html
The linear regression of "machine learning carefully explaining code progressive comments"