Sparse self-Encoder and its implementation-how to base

Source: Internet
Author: User

What is a self-encoder?

The self-Encoder itself is a BP neural network. It is an unsupervised learning algorithm.

We all know that neural networks can approach any function with any precision. Here we make the neural network's target value equal to the output value.X, That is, to simulate a constant function:


It's boring, right? What does the network mean when the input is equal to the output? However, when we add some limitations to the self-coded neural network, things change. As shown in 1, this is a basic self-coding neural network. We can see that the number of hidden layer nodes is less than the number of input layer nodes.


Figure 1

For example, if we input a 10x10 image, there will be 100 pixels, so the number of nodes in the input and output layers is 100. The number of nodes on the hidden layer is 25. Note: This will force the hidden layer node to learn how to compress the input data, and force the hidden layer to reconstruct 100-dimensional data with 25-dimensional data. This completes the learning process.

This is very similar to our learning process. Suppose there are a total of 100 test sites, but you can only summarize all these test sites with 25 knowledge points. This is the learning process.

 

What is a sparse self-encoder?

More generally, if the number of hidden layer nodes is large, or even more than the number of input layer nodes, we can still use the self-encoding algorithm, but we need to add the sparse limit. This is the sparse self-encoder.

 

What are the limitations on sparsity?

Simply put, it is necessary to ensure that neurons are hidden in a state of suppression in most cases. The specific manifestation is that the output of the sigmoid function is mostly 0, and the output of the tanh function is mostly-1. What are the benefits of this? This forces hidden neurons to exert their maximum potential and learn the true features under unfavorable conditions.

 

How can we measure the activation of a hidden neuron?

Just take the average value. Assume thatXTo hide neurons.JThe activation degree, then there will naturally be an average activation degree


 

Sparse penalty item-relative entropy

To ensure that the sparsity is as small as we want, for example


We need to add an additional penalty factor to the optimization target function, which is based on the relative entropy (kldivergence ):


Therefore, this penalty factor has the following properties. At that time (the sparse parameter here is 0.2), it is equal to 0. When the difference between the two is getting bigger and bigger, the relative entropy will quickly approach infinity, as shown in 2:

 

Figure 2

Comparison between sparse self-Encoder training mode and BP Neural Network

The cost function of the sparse self-encoding neural network is the cost function of the BP neural network plus a sparse penalty item:


Correspondingly, the residual iteration formula must also be corrected.


 

Implement a sparse self-Encoder


Data Overview

Here is an example provided by Andrew's famous ufldl, where the data file is called images. This is a 3D array of 512*512*10 and contains 10 images, each is 262144 pixels. Run

<span style="font-size:14px;">loadIMAGES;imagesc(IMAGES(:,:,6))colormapgray;</span>


Let's take a look at the 6th images. 3 shows an image about forests and snow mountains.


Figure 3


Data Sampling

Since the data picture is quite big, we cannot start training a neural network with more than 20 thousand nodes on each layer. First, we sample 10000 small patches randomly from 10 images. Each small patch is an 8*8 pixel small shard. We define the training set as a 64*10000 matrix. Each column is the result of pulling a small patch into a column vector.

The Code is as follows:

<span style="font-size:14px;">loadIMAGES;    % load images from diskpatchsize= 8;  % we'll use 8x8 patches%numpatches = 10;numpatches= 10000;%Initialize patches with zeros.  Your codewill fill in this matrix--one%column per patch, 10000 columns.patches= zeros(patchsize*patchsize, numpatches);ticimage_size=size(IMAGES);i=randi(image_size(1)-patchsize+1,1,numpatches);j=randi(image_size(2)-patchsize+1,1,numpatches);k=randi(image_size(3),1,numpatches);fornum=1:numpatchespatches(:,num)=reshape(IMAGES(i(num):i(num)+patchsize-1,j(num):j(num)+patchsize-1,k(num)),1,[]);endtoc</span>

Here, the Randi function, Randi (IMAX, m, n) generates M * n random matrix in the closed range [1, IMAX. The reshape function pulls each small patch into a column vector.

The first 200 patches are shown in Figure 4:


Figure 4

Sparseautoencodercost. m

This part of the code is the core and most important. As mentioned above, the optimization target function has three parts: mean variance, weight attenuation, and sparse penalty.

The program needs to be debugged in part. Otherwise, it is easy to get the error results that appear to be correct but do not have the correct code. The code of my latest version is attached here. It removes all explicit for loops and uses vectorized programming, which simplifies the code and enhances the running efficiency, but also reduces readability.

<span style="font-size:14px;">numpatches=size(patches,2);a2=sigmoid(W1*patches+repmat(b1,1,numpatches));a3=sigmoid(W2*a2+repmat(b2,1,numpatches));Rho=sum(a2,2)/numpatches;Penalty=-sparsityParam./Rho+(1-sparsityParam)./(1-Rho);Delta3=(a3-patches).*a3.*(1-a3);Delta2=(W2'*Delta3+beta*repmat(Penalty,1,numpatches)).*a2.*(1-a2);cost1=sumsqr(a3-patches)/numpatches/2;cost2=(sumsqr(W1)+sumsqr(W2))*lambda/2;cost3=beta*sum(sparsityParam*log(sparsityParam./Rho)+(1-sparsityParam)*log((1-sparsityParam)./(1-Rho)));cost=cost1+cost2+cost3;W2grad=Delta3*a2'/numpatches+lambda*W2;b2grad=sum(Delta3,2)/numpatches;W1grad=Delta2*patches'/numpatches+lambda*W1;b1grad=sum(Delta2,2)/numpatches;</span>

Here, repmat is used repeatedly to copy the Matrix. We must spare no effort to remove all the explicit for loops in this part of the code. Otherwise, the execution will take a long time.

 

Gradient Verification

Gradient verification is a big killer for code debugging. I use the for loop, which is very inefficient. Therefore, I need to adjust the verification parameters during debugging, so that the number of hidden nodes is 2 and the number of samples is 100, which can greatly speed up the verification. Otherwise, it will take a long time.

<span style="font-size:14px;">EPSILON=0.0001;thetaspslion=zeros(size(theta));fori=1:size(theta)thetaspslion(i)=EPSILON;numgrad(i)=(J(theta+thetaspslion)-J(theta-thetaspslion))/2/EPSILON;thetaspslion(i)=0;end</span>

Note: during the final operation, you must turn off the gradient verification. Otherwise, the system will crash...

This gradient verification must be gradual and step-by-step debugging of the three parts of the target function. After the average variance item debugging is correct, the image shown in Figure 5 will be obtained.


Figure 5

When both the mean variance item and the weight attenuation item are debugged, the image obtained is as follows:


Figure 6

When all the target functions are debugged, the next step is to witness the miracle!


Figure 7

This is the basis of the image. Through sparse self-encoder, we learned to get 25 8x8 bases, which are equivalent to what each of our optic nerve cells sees, when these cells form an array and Form layers, we can see everything.

As a basic module of unsupervised learning, sparse self-encoder is the first step in deeplearning:Ziji!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.