UFLDL Experimental Report 2:sparse Autoencoder

Last Update:2014-10-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Experimental report of Sparse Autoencoder sparse self-encoder

1.Sparse Autoencoder Sparse Self-encoder experiment Description

Self-coding neural network is an unsupervised learning algorithm, which uses the inverse propagation algorithm and makes the target value equal to the input value , for example . Self-coding neural networks try to learn a function. In other words, it tries to approximate an identity function so that the output is close to the input. When we add some restrictions to the self-coding neural network, such as adding sparse limits to hidden neurons, the self-coding neural network can still find some interesting structures in the input data even if the number of hidden neurons is large. Sparsity can be explained simply as follows. If the neuron's output is close to 1 and we think it is activated, and the output is close to 0, it is suppressed, so that the majority of the neuron's time is suppressed, which is called the sparse limit. For the training of self-coded neural networks with sparse restrictions, we can extract the characteristics of the training samples at a higher level. For example, in this experiment, the sparse self-coding of the image, the hidden layer unit can learn similar image edge features, we can think of the algorithm to learn a higher level than the simple pixel features.

implementation Process
Step 1: Generate Training Sample Set
Step 2: Sparse Self-coding objects: Calculating cost functions and gradients
Step 3: Gradient Check (if the check results are too large, return to STEP2)
Step 4: Training sparse self-encoder, updating parameters
Step 5: Visualize hidden layer units
3. Key points and codes, notes for each step
Step 1: Generate Training Sample Set
From 10 images, each randomly sampled 1000 8x8-sized blocks of pixels, resulting in a total of 10,000 pixel blocks. It is rearranged to get the matrix patches as the sample training set of the experiment, and patches is the matrix of 64x10,000.
At the same time randomly selected 204, shown as follows:
These images are processed in advance by albino whiting, which reduces the correlation between each neighboring pixel point.
The key implementation code for random sampling is as follows:
function patches = sampleimages ()
% Sampleimages
% Returns 10000 patches for training
Percent----------YOUR CODE here--------------------------------------
for imagenum = 1:10
[RowNum Colnum] = Size (IMAGES (:,:, Imagenum));
% Select patches from every
% Here, patch size is 8x8
% Randi ([imin,imax],m,n)
% patches:e.g. 64x10000
% reshape (x,m,n), MxN matrix; Here 8x8->64x1
For patchnum = 1:1000
XPos = Randi ([1,rownum-patchsize + 1]);
YPos = Randi ([1,colnum-patchsize + 1]);
Patches (:, (imageNum-1) *1000 + patchnum) = ...
Reshape (IMAGES (xpos:xpos + patchsize-1,ypos:ypos + patchsize-1,imagenum), 64, 1);
End
End
Step 2: Sparse Self-coding objects: Calculating cost functions and gradients
The expression for the hidden Layer cell output (activation) is as follows:
It can also be expressed as:
The vectorization expressions are as follows
This step is called forward propagation forward propagation, more generally, to the L-layer and l+1 layers in the neural network, there are:
The cost function is composed of three items:
which
And.
The algorithm iterates through iterations and tries to make
Using the inverse propagation (backward propagation) algorithm to calculate the prediction error, the gradient of the cost function is used, and the expression is as follows:
The algorithm calls Minfunc () to update the parameter w,b to get a better forecast model.
The key to Vectorization is to understand the dimension size of each variable, and the dimensions of each variable are as follows:
The key implementation code is as follows:
function [Cost,grad] = Sparseautoencodercost (theta, Visiblesize, Hiddensize, ...
Lambda, Sparsityparam, beta, data)
Percent----------YOUR CODE here--------------------------------------
[N,m] = size (data); % m is the number of traning set,n is the num of features
% forward algorithm
% B = Repmat (a,m,n), replicate and tile an ARRAY->MXN
% B1-B1 row vector 1xm
Z2 = W1*data+repmat (b1,1,m);
A2 = sigmoid (z2);
Z3 = W2*a2+repmat (b2,1,m);
A3 = sigmoid (Z3);
% compute first part of cost
Jcost = 0.5/m*sum (sum ((a3-data). ^2));
% compute the weight decay
Jweight = 1/2* lambda*sum (sum (w1.^2)) + 1/2*lambda*sum (sum (w2.^2));
% compute the sparse penalty
% Sparsityparam (RHO): the desired average activation for the hidden units
% Rho (rho^): The actual average activation of hidden unit
Rho = 1/m*sum (a2,2);
jsparse = beta * SUM (sparsityparam.*log (sparsityparam./rho) +...
(1-sparsityparam). *log ((1-sparsityparam)./(1-rho)));
% The total cost function
Cost = jcost + Jweight + jsparse;
% backward Propagation
% Compute Gradient
D3 =-(DATA-A3). *sigmoidgradient (Z3);
% since we introduce the sparsity Term--jsparse in cost function
Extra_term = beta* (-sparsityparam./rho+ (1-sparsityparam)./(1-rho));
% Add the extra term
D2 = (W2 ' *d3 + repmat (extra_term,1,m)). *sigmoidgradient (Z2);
% Compute W1grad
W1grad = 1/m*d2*data ' + lambda*w1;
% Compute W2grad
W2grad = 1/m*d3*a2 ' +lambda*w2;
% Compute B1grad
B1grad = 1/m*sum (d2,2);
% Compute B2grad
B2grad = 1/m*sum (d3,2);
Step 3: Gradient Check (if the check results are too large, return to STEP2)
CHECKNUMERICALGRADIENT.M defines a simple two-time function h (x) = x21+ 3x1x2, checking that the gradient at x = (4, ten) T points is calculated correctly. Help us verify that the gradient code is implemented correctly.
The numerical approximate expression of the gradient is as follows:
COMPUTENUMERICALGRADIENT.M can help us to check the accuracy of the calculated gradients in detail. The actual gradient obtained is as close as possible to the numerical calculation, which is less than ten e-9 in this experiment. If this difference is too large, the implementation code of the algorithm should be re-examined.
The numerical approximate key implementation code for the gradient is as follows:
function Numgrad = computenumericalgradient (J, theta)
Percent----------YOUR CODE here--------------------------------------
Epsilon = 1e-4;
n = size (theta,1);
E = Eye (n);
for i = 1:n
Delta = E (:, i) *epsilon;
Numgrad (i) = (J (theta+delta)-j (Theta-delta))/(epsilon*2.0);
End
Step 4: Training sparse self-encoder, updating parameters
In this experiment, the optimization function commonly used minfunc is a MATLAB optimization toolbox written by Mark Schmidt, using limited-memory BFGS algorithm to achieve optimization.
The optimization code looks like this:
% randomly initialize the parameters
theta = Initializeparameters (hiddensize, visiblesize);
% use Minfunc to minimize the function
% Addpath minfunc/
options. Method = ' Lbfgs '; Here , we use the L-BFGS to optimize
% function. Generally, for Minfunc
% need a function pointer with and outputs:the
% function value and the gradient. In our problem,
% sparseautoencodercost.m satisfies this.
options.maxiter = 400; % Maximum number of iterations of L-BFGS to run
options.display = ' on ';
[Opttheta, cost] = Minfunc (@ (P) sparseautoencodercost (p, ...
visiblesize, Hiddensize, ...
Lambda, Sparsityparam, ...
beta, patches), ...
theta, options);
Step 5: Visualize hidden layer units
Finally, the DISPLAY_NETWORK.M is called to visualize the hidden layer elements and coexist in the weights.jpg file.
W1 = Reshape (Opttheta (1:hiddensize*visiblesize), hiddensize, visiblesize);
Display_network (W1 ', 12);
print -djpeg weights.jpg % Save the visualization to a file /c9>
What does the weighted image of the experimental results represent? If the input feature satisfies the constraint that the two-dimensional number is less than 1, it satisfies:
Then it can be proved that only when each dimension of the input x satisfies: the active of the hidden layer is the largest, that is, the node output of the hidden layer is 1, it can be seen that the input value and weight value should be positive correlation.
4. Experimental results and operating environment
Experimental results
The gradient check results in a difference of 7.0949e-11, much less than 1.0e-9, satisfying the condition.
The resulting hidden layer Unit visualization image is as follows;
We can see that hidden layer units learn higher-order features like edge of image.
Gradient check time: 1261.874 seconds, approx. 21 minutes
Turn off gradient check, training sample time: 85.03 seconds
Operating Environment
Processor: AMD a6-3420m APU with Radeon (tm) HD Graphics 1.50 GHz
RAM:4.00GB (2.24GB available)
Os:windows 7,32 Bit
MATLAB:R2012B (8.0.0.783)
Appendix: Actual Running Results
... skip some results
>> Train
Iteration funevals Step Length Function Val Opt Cond
1 4 7.63802e-02) 8.26408e+00 2.99753e+02
2 5 1.00000e+00) 4.00717e+00 1.59412e+02
3 6 1.00000e+00) 1.63622e+00 6.87329e+01
4 7 1.00000e+00) 8.46885e-01 3.03970e+01
5 8 1.00000e+00) 5.82961e-01 1.13785e+01
6 9 1.00000e+00) 5.27282e-01 3.28861e+00
7 1.00000e+00 5.21369e-01 5.66333e-01
8 1.00000e+00 5.21182e-01 6.87621e-02
9 1.00000e+00 5.21174e-01 5.95455e-02
1.00000e+00 5.21153e-01 1.00395e-01
1.00000e+00 5.21136e-01 8.79291e-02
1.00000e+00 5.21108e-01 8.22846e-02
1.00000e+00 5.21027e-01 1.21261e-01
....
395 410 1.00000e+00 4.46807e-01 4.30329e-02
396 411 1.00000e+00 4.46794e-01 5.90697e-02
397 412 1.00000e+00 4.46780e-01 6.49777e-02
398 413 1.00000e+00 4.46768e-01 4.46670e-02
399 414 1.00000e+00 4.46761e-01 2.51915e-02
415 1.00000e+00 4.46758e-01 2.03033e-02
Exceeded Maximum number of iterations
>>

UFLDL Experimental Report 2:sparse Autoencoder

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

UFLDL Experimental Report 2:sparse Autoencoder

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

UFLDL Experimental Report 2:sparse Autoencoder

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support