Continue to learn http://www.cnblogs.com/tornadomeet/archive/2013/03/15/2962116.html, the last class learning rate is fixed, and here we aim to find a better learning rate. We mainly observe the different learning rate corresponding to the different loss value and the number of iterations between the function curve is how to find the fastest convergence of the function curve, the corresponding learning rate is what we are looking for a better learning rate. Here we take the rate value as: 0.001,0.01,0.1,1,2, when we choose to finish the learning rate, the rest is the same as the previous lesson. The problem to be solved is to give 47 training samples, the Y value of the training sample is the house price, The X attribute has 2, one is the size of the house, the other is the number of the house bedroom. These training data are needed to learn the functions of the system, predicting a house size of 1650, and the price of a bedroom with 3 houses.
The code is as follows:
x = Load ('Ex3x.dat'); y= Load ('Ex3y.dat'); x= [Ones (Size (x,1),1x];% each row is a sample, where each sample is added one dimension 1, for the reason in the previous lesson (speaking wx+b into W'x Homogeneous)Meanx = mean (x);%averaging the next four lines are to standardize the values for each dimension of the sample (except for the first Dimension 1). Sigmax= STD (x);%standard deviation but not in front of the linear feature scale. (First lesson) x (:,2) = (X (:,2)-meanx (2)./sigmax (2); X (:,3) = (X (:,3)-meanx (3)./sigmax (3); Figureitera_num= -; %number of iterations attempted sample_num= Size (x,1); %number of training samples Alpha= [0.01,0.03,0.1,0.3,1,1.3];%because almost every 3 times times the learning rate to test, so directly enumerated Plotstyle= {'b','R','g','k','b--','r--'};%A package is built, each value represents a different curve style, B is blue blue,%r is red, G is green. b--is blue color--represents a dashed line, and those that are not added to the front are implementations. Theta_grad_descent= Zeros (Size (x (1,:))); forAlpha_i =1: Length (Alpha)%alpha_i is 1,2,...6, which represents the coordinates of the learning rate vector and the curve format vector: Alpha (alpha_i), Plotstyle (alpha_i) Theta= Zeros (Size (x,2),1); %theta is the cost function parameter, and the initial value is assigned a value of 0 vectors (3*vector of 1, X has several dimensions theta is a parameter vector of several dimensions) Jtheta= Zeros (Itera_num,1);%jthete is a 100*.A vector of 1, the nth element representing the value of the nth iteration cost function (the total mean square error between the predicted and true y) fori =1: Itera_num%calculates the parameter Jtheta (i) after the number of iterations of a learning rate alpha Itera_num= (1/(2*sample_num)). * (X*THETA-Y)'* (x*theta-y);%jtheta is a 100*1 column vector. (x*theta-y)'* (x*theta-y) is representative of%The square of the cost function formula, because there is no direct square square on the vector level, so this is the form of the inner product after the transpose. and get it.% is a scalar, so multiplying with the previous coefficients can be used directly with * instead of. *and a little bit of the front factor I still don't understand whyIs1/(2*sample_num)) Grad= (1/sample_num). *x'* (x*theta-y);theta = Theta-alpha (alpha_i). *grad; End Plot (0: the, Jtheta (1: -),Char(Plotstyle (alpha_i)),'linewidth',2)%It is important to use the CHAR function to convert the packet () to the cell after the package () index.%so you can use the Char function or the {} index, so you don't have to convert. %a learning rate corresponding to the image drawn out later to draw the next learning rate corresponding to the image. onif(1= = Alpha (alpha_i))%The result of the experiment is that the alpha 1 o'clock is the best, then the theta value after the iteration is the desired value theta_grad_descent=Theta Endendlegend ('0.01','0.03','0.1','0.3','1','1.3'); Xlabel ('Number of iterations') Ylabel ('Cost function')%here is the prediction formula Price_grad_descend= Theta_grad_descent'*[1 (1650-meanx (2))/sigmax (2) (3-meanx (3)/sigmax (3))]'
Experimental results:
Deep Learning Learning Note (iii) linear regression learning rate optimization Search