Implementation ProcessStep 1: generate training input and test sample set
Step 2: Train the sparse self-Encoder
Step 3: extract features
Step 4: Train and test the softmax Classifier
Step 5: classify the test sample set and calculate the accuracy
3. Key points, codes, and comments in each step
Step 1: generate input and test sample sets
Use loadmnistimages. M and loadmnistlabels. m to load data from the mnist database. Pay attention to the path and name of the data file.
Step 2: Train the sparse self-Encoder
Use the training image without tags as the input and train the sparse self-encoder to obtain the optimal weight value. In this step, call minfunc. M and sparseautoencodercost. M obtained from the previous experiment.
The specific implementation code is as follows:
% Find opttheta by running the sparse autoencoder on
% Unlabeledtrainingimages
Opttheta = Theta;
% [Cost, grad] = sparseautoencodercost (Theta, inputsize, hiddensize, lambda ,...
% Sparsityparam, beta, unlabeleddata );
% Use minfunc to minimize the Function
Addpath minfunc/
Options. method = 'lbfg'; % here, we use L-BFGS to optimize our cost
% Function. Generally, for minfunc to work, you
% Need a function pointer with two outputs:
% Function value and the gradient. In our problem,
% Sparseautoencodercost. M satisfies this.
Options. maxiter = 400; % maximum number of iterations of L-BFGS to run
Options. Display = 'on ';
[Opttheta, cost] = minfunc (@ (p) sparseautoencodercost (p ,...
Inputsize, hiddensize ,...
Lambda, sparsityparam ,...
Beta, unlabeleddata ),...
Theta, options );
Step 3: extract features
Call feedforwardautoencoder in this step. m: Calculate the output (activation value) of the hidden layer unit of the sparse self-encoder. These outputs are higher-order features we extract from the training image without tags.
Add the following code to feedforwardautoencoder. M:
% Compute the activation of the hidden layer for the sparse autoencoder.
M = size (data, 2 );
Z2 = W1 * Data + repmat (B1, 1, M );
Activation = sigmoid (Z2 );
Step 4: Train and test the softmax Classifier
Use the softmaxcost. M and softmaxtrain. M obtained in the previous experiment to train the feature and training tag set trainlabels extracted in step 3, and obtain a multi-class classifier.
The specific implementation code is as follows:
Inputsize = hiddensize;
% C = unique (a) for the array a returns the same values as in a but
% No repetitions. C will be sorted.
% A = [9 9 9 9 9 9 8 8 8 7 7 7 6 6 6 5 4 2 1]
% C = unique (a)-> C = [1 2 4 5 6 7 8 9]
Numclasses = numel (unique (trainlabels ));
Lambda1 = 1e-4;
Options. Max iter = 100;
Softmaxmodel = softmaxtrain (inputsize, numclasses, lambda1 ,...
Trainfeatures, trainlabels, options );
Step 5: classify the test sample set and calculate the accuracy
Call the softmaxpredict. M obtained from the previous experiment to predict the test sample set and calculate the accuracy. The specific implementation code is as follows:
[Pred] = softmaxpredict (softmaxmodel, testfeatures );
Appendix: some key codes and explanations of sparse self-Encoder
The expression of the hidden layer unit output (activation) is as follows:
It can also be expressed:
The vectorized expression is as follows:
This step is called Forward Propagation Forward propagation. More generally, for layers l and L + 1 in a neural network, there are:
Cost functions consist of three types:
Where
And.
Through iteration, try to make
The gradient of the cost function is used to calculate the prediction error using the backward propagation algorithm. The expression is as follows:
The algorithm calls minfunc () to update the W and B parameters to obtain a better prediction model.
The key to vectoring is to understand the dimension size of each variable. The dimension size of each variable is as follows:
The key implementation code is as follows:
Function [cost, grad] = sparseautoencodercost (Theta, visiblesize, hiddensize ,...
Lambda, sparsityparam, beta, data)
% ---------- Your code here --------------------------------------
[N, m] = size (data); % m is the number of Traning set, n is the num of features
% Forward Algorithm
% B = repmat (a, m, n)-> replicate and tile an array-> mxn
% B1-> B1 row vector 1xm
Z2 = W1 * Data + repmat (B1, 1, M );
A2 = sigmoid (Z2 );
Z3 = W2 * A2 + repmat (B2, 1, M );
A3 = sigmoid (Z3 );
% Compute first part of cost
Jcost = 0.5/M * sum (a3-data). ^ 2 ));
% Compute the weight decay
Jweight = 1/2 * Lambda * sum (W1. ^ 2) + 1/2 * Lambda * sum (W2. ^ 2 ));
% Compute the sparse penalty
% Sparsityparam (ROV): the desired average activation for the hidden units
% Rock (ROV ^): the actual average activation of Hidden Unit
Rock = 1/M * sum (A2, 2 );
Jsparse = beta * sum (sparsityparam. * log (sparsityparam./ROV) +...
(1-sparsityparam). * log (1-sparsityparam)./(1-rock )));
% The complete cost function
Cost = jcost + jweight + jsparse;
% Backward Propagation
% Compute gradient
D3 =-(data-a3). * sigmoidgradient (Z3 );
% Since we introduce the sparsity term -- jsparse in cost function
Extra_term = beta * (-sparsityparam./ROV + (1-sparsityparam)./(1-rock ));
% Add the extra term
D2 = (W2 '* D3 + repmat (extra_term, 1, m). * sigmoidgradient (Z2 );
% Compute w1grad
W1grad = 1/M * D2 * Data '+ Lambda * W1;
% Compute w2grad
W2grad = 1/M * D3 * A2 '+ Lambda * W2;
% Compute b1grad
B1grad = 1/M * sum (D2, 2 );
% Compute b2grad
B2grad = 1/M * sum (D3, 2 );