UFLDL Learning notes and programming Jobs: multi-layer neural Network (Multilayer neural networks + recognition handwriting programming)
UFLDL out a new tutorial, feel better than before, from the basics, the system is clear, but also programming practice.
In deep learning high-quality group inside listen to some predecessors said, do not delve into other machine learning algorithms, you can directly to learn DL.
So recently began to engage in this, the tutorial plus MATLAB programming, is perfect AH.
The address of the new tutorial is: http://ufldl.stanford.edu/tutorial/
This section study address: http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Neural Network general solution process:
1 forward propagation, the activation value of each layer is calculated, and the total cost.
Basically, the activation values of hidden layers are weighted and plus bias, and then activate functions such as sigmoid.
The activation value of the output layer, perhaps not called the activation value, is better than the eigenvalue value. Taking Softmax as an example, the activation value of the previous layer as the feature input x, the weight w as the theta parameter, the h is calculated according to the formula.
2 Reverse propagation.
The residual of the output layer is calculated first. This can be directly derivative based on the loss function.
The gradient of W and B of L-layer can be obtained by the residual of l+1 layer and the activation value of L layer.
The residual of L layer can be obtained by the residual of the l+1 layer and the L-layer W, and the partial derivative of the L-layer activation function.
3 Add weight attenuation to prevent overfitting. The cost and the gradient need to be adjusted accordingly.
Here is the code for SUPERVISED_DNN_COST.M:
function [Cost, Grad, pred_prob] = Supervised_dnn_cost (theta, EI, data, labels, pred_only)%spnetcostslave Slave Ction for simple phone net% Does all the work of cost/gradient computation% Returns cost broken into cross-entropy, Weight norm, and ProX reg% components (Cecost, Wcost, pcost) percent default VALUESPO = False;if exist (' pred_only ', ' var ') PO = pred_only;end;%% reshape into Networknumhidden = Numel (ei.layer_sizes)-1;numsamples = Size (data, 2); hact = cell (numhidden+1, 1); gradstack = cell (numhidden+1, 1); stack = Params2stack (theta, EI); percent forward prop%%% YOUR CODE here%%%for L=1:numhidden% hidden layer feature calculates if (L = = 1) z = stack{l}. W*data;else z = stack{l}. W*hact{l-1};endz = Bsxfun (@plus, z,stack{l}.b); Hact{l}=sigmoid (z); end% output layer (SOFTMAX) feature calculates h = (stack{numhidden+1}. W) *hact{numhidden};h = Bsxfun (@plus, h,stack{numhidden+1}.b), E = exp (h);p Red_prob = Bsxfun (@rdivide, E,sum (e,1)); % probability table hact{numhidden+1} = pred_prob;%[~,pred_labels] = max (Pred_prob, [], 1), percent return here if onlY predictions desired.if po cost =-1; Cecost =-1; Wcost =-1; Numcorrect =-1; Grad = []; return;end;%% compute cost output layer Softmax cost%%% YOUR CODE here%%%cecost =0;c= log (Pred_prob);%fprintf ("%d,%d\n", Size ( labels,1), size (labels,2)); %60000,1i=sub2ind (Size (c), labels ', 1:size (c,2));% finds a linear index of matrix C, the row is specified by labels, the column is specified by 1:size (c,2), and the resulting linear index is returned to Ivalues = C (I); Cecost =-sum (values); percent compute gradients using backpropagation%%% YOUR CODE here%%%% cross Entroy gradient%d = Full (Spa RSE (Labels,1:size (c,2), 1)); D = zeros (size (pred_prob));d(I) = 1;error = (pred_prob-d); % residual gradient of the output layer, residual reverse propagation for L = numhidden+1: -1:1gradstack{l}.b = SUM (error,2); if (L = = 1) gradstack{l}. W = Error*data '; When Break;%l==1, that is, when the current layer is the first layer of hidden layer, no redistribution of the residuals else gradstack{l} is required. W = error*hact{l-1} '; enderror = (Stack{l}. W) ' *error. *hact{l-1}.* (1-hact{l-1}),% of the following is the activation function partial derivative end%% compute weight penalty cost and gradient for Non-bias terms%%% YO UR CODE Here%%%wcost = 0;for L = 1:numhidden+1 wcost = wcost +. 5 * ei.lambda * SUM (stack{l}. W(:) . ^ 2);% of the ownership value squared and endcost = cecost + wcost;% Computing The gradient of the weight decay.for L = Numhidden: -1:1 gradstac K{l}. W = Gradstack{l}. W + Ei.lambda * stack{l}. W;%softmax useless to the weight decay item end%% reshape gradients into vector[grad] = Stack2params (gradstack); end
The original training set is 60,000, a little time, I changed the run_train.m code, the training set changed 10,000.
Of course it affects accuracy.
This article linger
This article link: http://blog.csdn.net/lingerlanlan/article/details/38464317