implementation ProcessStep 1: Generate Training Sample Set
Step 2: Sparse Self-coding objects: Calculating cost functions and gradients
Step 3: Gradient Check (if the check results are too large, return to STEP2)
Step 4: Training sparse self-encoder, updating parameters
Step 5: Visualize hidden layer units
3. Key points and codes, notes for each step
Step 1: Generate Training Sample Set
From 10 images, each randomly sampled 1000 8x8-sized blocks of pixels, resulting in a total of 10,000 pixel blocks. It is rearranged to get the matrix patches as the sample training set of the experiment, and patches is the matrix of 64x10,000.
At the same time randomly selected 204, shown as follows:
These images are processed in advance by albino whiting, which reduces the correlation between each neighboring pixel point.
The key implementation code for random sampling is as follows:
function patches = sampleimages ()
% Sampleimages
% Returns 10000 patches for training
Percent----------YOUR CODE here--------------------------------------
for imagenum = 1:10
[RowNum Colnum] = Size (IMAGES (:,:, Imagenum));
% Select patches from every
% Here, patch size is 8x8
% Randi ([imin,imax],m,n)
% patches:e.g. 64x10000
% reshape (x,m,n), MxN matrix; Here 8x8->64x1
For patchnum = 1:1000
XPos = Randi ([1,rownum-patchsize + 1]);
YPos = Randi ([1,colnum-patchsize + 1]);
Patches (:, (imageNum-1) *1000 + patchnum) = ...
Reshape (IMAGES (xpos:xpos + patchsize-1,ypos:ypos + patchsize-1,imagenum), 64, 1);
End
End
Step 2: Sparse Self-coding objects: Calculating cost functions and gradients
The expression for the hidden Layer cell output (activation) is as follows:
It can also be expressed as:
The vectorization expressions are as follows
This step is called forward propagation forward propagation, more generally, to the L-layer and l+1 layers in the neural network, there are:
The cost function is composed of three items:
which
And.
The algorithm iterates through iterations and tries to make
Using the inverse propagation (backward propagation) algorithm to calculate the prediction error, the gradient of the cost function is used, and the expression is as follows:
The algorithm calls Minfunc () to update the parameter w,b to get a better forecast model.
The key to Vectorization is to understand the dimension size of each variable, and the dimensions of each variable are as follows:
The key implementation code is as follows:
function [Cost,grad] = Sparseautoencodercost (theta, Visiblesize, Hiddensize, ...
Lambda, Sparsityparam, beta, data)
Percent----------YOUR CODE here--------------------------------------
[N,m] = size (data); % m is the number of traning set,n is the num of features
% forward algorithm
% B = Repmat (a,m,n), replicate and tile an ARRAY->MXN
% B1-B1 row vector 1xm
Z2 = W1*data+repmat (b1,1,m);
A2 = sigmoid (z2);
Z3 = W2*a2+repmat (b2,1,m);
A3 = sigmoid (Z3);
% compute first part of cost
Jcost = 0.5/m*sum (sum ((a3-data). ^2));
% compute the weight decay
Jweight = 1/2* lambda*sum (sum (w1.^2)) + 1/2*lambda*sum (sum (w2.^2));
% compute the sparse penalty
% Sparsityparam (RHO): the desired average activation for the hidden units
% Rho (rho^): The actual average activation of hidden unit
Rho = 1/m*sum (a2,2);
jsparse = beta * SUM (sparsityparam.*log (sparsityparam./rho) +...
(1-sparsityparam). *log ((1-sparsityparam)./(1-rho)));
% The total cost function
Cost = jcost + Jweight + jsparse;
% backward Propagation
% Compute Gradient
D3 =-(DATA-A3). *sigmoidgradient (Z3);
% since we introduce the sparsity Term--jsparse in cost function
Extra_term = beta* (-sparsityparam./rho+ (1-sparsityparam)./(1-rho));
% Add the extra term
D2 = (W2 ' *d3 + repmat (extra_term,1,m)). *sigmoidgradient (Z2);
% Compute W1grad
W1grad = 1/m*d2*data ' + lambda*w1;
% Compute W2grad
W2grad = 1/m*d3*a2 ' +lambda*w2;
% Compute B1grad
B1grad = 1/m*sum (d2,2);
% Compute B2grad
B2grad = 1/m*sum (d3,2);
Step 3: Gradient Check (if the check results are too large, return to STEP2)
CHECKNUMERICALGRADIENT.M defines a simple two-time function h (x) = x21+ 3x1x2, checking that the gradient at x = (4, ten) T points is calculated correctly. Help us verify that the gradient code is implemented correctly.
The numerical approximate expression of the gradient is as follows:
COMPUTENUMERICALGRADIENT.M can help us to check the accuracy of the calculated gradients in detail. The actual gradient obtained is as close as possible to the numerical calculation, which is less than ten e-9 in this experiment. If this difference is too large, the implementation code of the algorithm should be re-examined.
The numerical approximate key implementation code for the gradient is as follows:
function Numgrad = computenumericalgradient (J, theta)
Percent----------YOUR CODE here--------------------------------------
Epsilon = 1e-4;
n = size (theta,1);
E = Eye (n);
for i = 1:n
Delta = E (:, i) *epsilon;
Numgrad (i) = (J (theta+delta)-j (Theta-delta))/(epsilon*2.0);
End
Step 4: Training sparse self-encoder, updating parameters
In this experiment, the optimization function commonly used minfunc is a MATLAB optimization toolbox written by Mark Schmidt, using limited-memory BFGS algorithm to achieve optimization.
The optimization code looks like this:
% randomly initialize the parameters
theta = Initializeparameters (hiddensize, visiblesize);
% use Minfunc to minimize the function
% Addpath minfunc/
options. Method = ' Lbfgs '; Here , we use the L-BFGS to optimize
% function. Generally, for Minfunc
% need a function pointer with and outputs:the
% function value and the gradient. In our problem,
% sparseautoencodercost.m satisfies this.
options.maxiter = 400; % Maximum number of iterations of L-BFGS to run
options.display = ' on ';
[Opttheta, cost] = Minfunc (@ (P) sparseautoencodercost (p, ...
visiblesize, Hiddensize, ...
Lambda, Sparsityparam, ...
beta, patches), ...
theta, options);
Step 5: Visualize hidden layer units
Finally, the DISPLAY_NETWORK.M is called to visualize the hidden layer elements and coexist in the weights.jpg file.
W1 = Reshape (Opttheta (1:hiddensize*visiblesize), hiddensize, visiblesize);
Display_network (W1 ', 12);
print -djpeg weights.jpg % Save the visualization to a file /c9>
What does the weighted image of the experimental results represent? If the input feature satisfies the constraint that the two-dimensional number is less than 1, it satisfies:
Then it can be proved that only when each dimension of the input x satisfies: the active of the hidden layer is the largest, that is, the node output of the hidden layer is 1, it can be seen that the input value and weight value should be positive correlation.
4. Experimental results and operating environment
Experimental results
The gradient check results in a difference of 7.0949e-11, much less than 1.0e-9, satisfying the condition.
The resulting hidden layer Unit visualization image is as follows;
We can see that hidden layer units learn higher-order features like edge of image.
Gradient check time: 1261.874 seconds, approx. 21 minutes
Turn off gradient check, training sample time: 85.03 seconds
Operating Environment
Processor: AMD a6-3420m APU with Radeon (tm) HD Graphics 1.50 GHz
RAM:4.00GB (2.24GB available)
Os:windows 7,32 Bit
MATLAB:R2012B (8.0.0.783)