This is mainly from self-study to deep learning, simple record as follows:
(1) deep Learning is more expressive than shallow network learning, and it expresses much more function set than shallow network in a compact and concise way.
(2) The shortcomings of data acquisition, local maximum and gradient dispersion exist in the extension of traditional shallow neural networks.
(3) stack self-coding neural network is a neural network composed of multilayer sparse self-encoder ( softmax regression or logistic regression classification used in the last layer ) , the initial parameters are obtained by the greedy training method, which can make full use of untagged data in data acquisition. It is also called pre-training through a layer-wise greedy training method, which can then be fine-tuned using a tagged data set.
Exercise Answer:
(1) Percent STEP 2:train the first sparse autoencoder
Addpath minfunc/options. Method = ' Lbfgs '; Here, we use the L-BFGS to optimize our cost % function. Generally, for Minfunc to work, you % need a function pointer with the Outputs:the % function value and the Gradi Ent. In our problem, % sparseautoencodercost.m satisfies this.options.maxIter = n; % Maximum number of iterations of L-BFGS to run Options.display = ' on '; [Sae1opttheta, Cost] = Minfunc (@ (P) sparseautoencodercost (p, ... Inputsize, hiddenSizeL1, ... Lambda, Sparsityparam, ... Beta, traindata), ...
(2) % STEP 2:train The second sparse autoencoder
Options. Method = ' Lbfgs '; Here, we use the L-BFGS to optimize our cost % function. Generally, for Minfunc to work, you % need a function pointer with the Outputs:the % function value and the Gradi Ent. In our problem, % sparseautoencodercost.m satisfies this.options.maxIter = n; % Maximum number of iterations of L-BFGS to run Options.display = ' on '; [Sae2opttheta, Cost2] = Minfunc (@ (P) sparseautoencodercost (p, ... HiddenSizeL1, hiddenSizeL2, ... Lambda, Sparsityparam, ... Beta, sae1features), ... Sae2theta, Options);
(3) Percent STEP 3:train the Softmax classifier
Addpath minfunc/options. Method = ' Lbfgs '; Here, we use the L-BFGS to optimize our cost % function. Generally, for Minfunc to work, you % need a function pointer with the Outputs:the % function value and the Gradi Ent. In we problem, % softmaxcost.m satisfies this.minFuncOptions.display = ' on '; lambda = 1e-4; [Saesoftmaxtheta, Cost3] = Minfunc (@ (P) softmaxcost (p, ... Numclasses, hiddenSizeL2, lambda, ... Sae2features, Trainlabels), ... Saesoftmaxtheta, Options);
(4) Percent STEP 5:finetune Softmax model
Addpath minfunc/options. Method = ' Lbfgs '; Here, we use the L-BFGS to optimize our cost % function. Generally, for Minfunc to work, you % need a function pointer with the Outputs:the % function value and the Gradi Ent. In we problem, % softmaxcost.m satisfies this.minFuncOptions.display = ' on '; [ Stackedaeopttheta, Cost3] = Minfunc (@ (P) stackedaecost (p, ... Inputsize, hiddenSizeL2, ... Numclasses, Netconfig, ... Lambda, Traindata, trainlabels), ... Stackedaetheta, Options);
Stackedaecost.m
depth = Numel (stack); z = cell (depth+1, 1); % input + hidden layer za = cell (depth+1, 1); % input + hidden layer excitation function a{1} = data;for i = 1:depth% compute hidden layer Z and excitation a z{i+1} = STACK{I}.W * A{i} + repmat (stack{i}.b, 1, NUMCA SES); A{i+1} = sigmoid (z{i+1}); endm = Softmaxtheta * a{depth+1}; % calculated Softmax corresponding to the excitation value M = Bsxfun (@minus, M, Max (M, [],1)); m = exp (m); % P = bsxfun (@rdivide, M, sum (m)); Cost = -1/numcases. * SUM (Groundtruth (:) ' *log (P (:))) + LAMBDA/2 *sum (Softmaxtheta (:). ^2); % Cost Functionsoftmaxthetagrad = -1/numcases. * (groundtruth-p) * a{depth+1} ' + lambda * softmaxtheta; % grad Softmax corresponds to the parameter Delta = cell (depth+1); The% error entry only needs to be calculated on the hidden layer delta{depth+1} =-(Softmaxtheta ' * (GROUNDTRUTH-P)). * A{depth+1}. * (1-a{depth+1}); The error corresponding to the last hidden layer can be deduced for the for layer = depth: -1:2 Delta{layer} = (STACK{LAYER}.W * delta{layer+1}). * A{layer}. * (1-a{l Ayer}); Percent calculation the error corresponding to the previous layers does not take into account the coefficients and the Bayesian school parameters endfor layer = Depth: -1:1% calculates the gradient stack for each hidden layer parameter W and bGRAD{LAYER}.W = delta{layer+1} * a{layer} './numcases; stackgrad{layer}.b = SUM (delta{layer+1}, 2)./numcases;end
(5) Percent STEP 6:test
numcases = Size (data, 2);d epth = Numel (stack); z = cell (depth+1, 1); % Pitchfork name Mitsu + 闅 Refer bookmark ba kinh crypto za = cell (depth+1, 1); % Fork name Mitsu + 闅 Refer bookmark ba kinh crypto upsome} = credential i = 暟 % a{1 $ data;for 1:depth optin 畻 闅 refer bookmark z ba kinh crypto tel 屾 縺} = animals * Tapes} + z{i+1 (STACK{I}.W, 1, numcases); A{i+1} = sigmoid (z{i+1}); end[~, pred] = max (Softmaxtheta * a{depth+1});
in the end I got the result: beforefinetuning Test accuracy:92.150% , After finetuning Test accuracy:96.680% and the answer to the exercise is a bit of an error.
UFLDL Tutorial Notes and Practice answers IV (establishing a classification with deep learning)