UFLDL Experimental Report 2:sparse Autoencoder

Source: Internet
Author: User

Experimental report of Sparse Autoencoder sparse self-encoder

1.Sparse Autoencoder Sparse Self-encoder experiment Description

Self-coding neural network is an unsupervised learning algorithm, which uses the inverse propagation algorithm and makes the target value equal to the input value , for example . Self-coding neural networks try to learn a function. In other words, it tries to approximate an identity function so that the output is close to the input. When we add some restrictions to the self-coding neural network, such as adding sparse limits to hidden neurons, the self-coding neural network can still find some interesting structures in the input data even if the number of hidden neurons is large. Sparsity can be explained simply as follows. If the neuron's output is close to 1 and we think it is activated, and the output is close to 0, it is suppressed, so that the majority of the neuron's time is suppressed, which is called the sparse limit. For the training of self-coded neural networks with sparse restrictions, we can extract the characteristics of the training samples at a higher level. For example, in this experiment, the sparse self-coding of the image, the hidden layer unit can learn similar image edge features, we can think of the algorithm to learn a higher level than the simple pixel features.

  1. implementation Process

    Step 1: Generate Training Sample Set

    Step 2: Sparse Self-coding objects: Calculating cost functions and gradients

    Step 3: Gradient Check (if the check results are too large, return to STEP2)

    Step 4: Training sparse self-encoder, updating parameters

    Step 5: Visualize hidden layer units

    3. Key points and codes, notes for each step

    Step 1: Generate Training Sample Set

    From 10 images, each randomly sampled 1000 8x8-sized blocks of pixels, resulting in a total of 10,000 pixel blocks. It is rearranged to get the matrix patches as the sample training set of the experiment, and patches is the matrix of 64x10,000.

    At the same time randomly selected 204, shown as follows:

    These images are processed in advance by albino whiting, which reduces the correlation between each neighboring pixel point.

    The key implementation code for random sampling is as follows:

    function patches = sampleimages ()

    % Sampleimages

    % Returns 10000 patches for training

    Percent----------YOUR CODE here--------------------------------------

    for imagenum = 1:10

    [RowNum Colnum] = Size (IMAGES (:,:, Imagenum));

    % Select patches from every

    % Here, patch size is 8x8

    % Randi ([imin,imax],m,n)

    % patches:e.g. 64x10000

    % reshape (x,m,n), MxN matrix; Here 8x8->64x1

    For patchnum = 1:1000

    XPos = Randi ([1,rownum-patchsize + 1]);

    YPos = Randi ([1,colnum-patchsize + 1]);

    Patches (:, (imageNum-1) *1000 + patchnum) = ...

    Reshape (IMAGES (xpos:xpos + patchsize-1,ypos:ypos + patchsize-1,imagenum), 64, 1);

    End

    End

    Step 2: Sparse Self-coding objects: Calculating cost functions and gradients

    The expression for the hidden Layer cell output (activation) is as follows:

    It can also be expressed as:

    The vectorization expressions are as follows

    This step is called forward propagation forward propagation, more generally, to the L-layer and l+1 layers in the neural network, there are:

    The cost function is composed of three items:

    which

    And.

    The algorithm iterates through iterations and tries to make

    Using the inverse propagation (backward propagation) algorithm to calculate the prediction error, the gradient of the cost function is used, and the expression is as follows:

    The algorithm calls Minfunc () to update the parameter w,b to get a better forecast model.

    The key to Vectorization is to understand the dimension size of each variable, and the dimensions of each variable are as follows:

    The key implementation code is as follows:

    function [Cost,grad] = Sparseautoencodercost (theta, Visiblesize, Hiddensize, ...

    Lambda, Sparsityparam, beta, data)

    Percent----------YOUR CODE here--------------------------------------

    [N,m] = size (data); % m is the number of traning set,n is the num of features

    % forward algorithm

    % B = Repmat (a,m,n), replicate and tile an ARRAY->MXN

    % B1-B1 row vector 1xm

    Z2 = W1*data+repmat (b1,1,m);

    A2 = sigmoid (z2);

    Z3 = W2*a2+repmat (b2,1,m);

    A3 = sigmoid (Z3);

    % compute first part of cost

    Jcost = 0.5/m*sum (sum ((a3-data). ^2));

    % compute the weight decay

    Jweight = 1/2* lambda*sum (sum (w1.^2)) + 1/2*lambda*sum (sum (w2.^2));

    % compute the sparse penalty

    % Sparsityparam (RHO): the desired average activation for the hidden units

    % Rho (rho^): The actual average activation of hidden unit

    Rho = 1/m*sum (a2,2);

    jsparse = beta * SUM (sparsityparam.*log (sparsityparam./rho) +...

    (1-sparsityparam). *log ((1-sparsityparam)./(1-rho)));

    % The total cost function

    Cost = jcost + Jweight + jsparse;

    % backward Propagation

    % Compute Gradient

    D3 =-(DATA-A3). *sigmoidgradient (Z3);

    % since we introduce the sparsity Term--jsparse in cost function

    Extra_term = beta* (-sparsityparam./rho+ (1-sparsityparam)./(1-rho));

    % Add the extra term

    D2 = (W2 ' *d3 + repmat (extra_term,1,m)). *sigmoidgradient (Z2);

    % Compute W1grad

    W1grad = 1/m*d2*data ' + lambda*w1;

    % Compute W2grad

    W2grad = 1/m*d3*a2 ' +lambda*w2;

    % Compute B1grad

    B1grad = 1/m*sum (d2,2);

    % Compute B2grad

    B2grad = 1/m*sum (d3,2);

    Step 3: Gradient Check (if the check results are too large, return to STEP2)

    CHECKNUMERICALGRADIENT.M defines a simple two-time function h (x) = x21+ 3x1x2, checking that the gradient at x = (4, ten) T points is calculated correctly. Help us verify that the gradient code is implemented correctly.

    The numerical approximate expression of the gradient is as follows:

    COMPUTENUMERICALGRADIENT.M can help us to check the accuracy of the calculated gradients in detail. The actual gradient obtained is as close as possible to the numerical calculation, which is less than ten e-9 in this experiment. If this difference is too large, the implementation code of the algorithm should be re-examined.

    The numerical approximate key implementation code for the gradient is as follows:

    function Numgrad = computenumericalgradient (J, theta)

    Percent----------YOUR CODE here--------------------------------------

    Epsilon = 1e-4;

    n = size (theta,1);

    E = Eye (n);

    for i = 1:n

    Delta = E (:, i) *epsilon;

    Numgrad (i) = (J (theta+delta)-j (Theta-delta))/(epsilon*2.0);

    End

    Step 4: Training sparse self-encoder, updating parameters

    In this experiment, the optimization function commonly used minfunc is a MATLAB optimization toolbox written by Mark Schmidt, using limited-memory BFGS algorithm to achieve optimization.

    The optimization code looks like this:

    % randomly initialize the parameters

    theta = Initializeparameters (hiddensize, visiblesize);

    % use Minfunc to minimize the function

    % Addpath minfunc/

    options. Method = ' Lbfgs '; Here , we use the L-BFGS to optimize

    % function. Generally, for Minfunc

    % need a function pointer with and outputs:the

    % function value and the gradient. In our problem,

    % sparseautoencodercost.m satisfies this.

    options.maxiter = 400; % Maximum number of iterations of L-BFGS to run

    options.display = ' on ';

    [Opttheta, cost] = Minfunc (@ (P) sparseautoencodercost (p, ...

    visiblesize, Hiddensize, ...

    Lambda, Sparsityparam, ...

    beta, patches), ...

    theta, options);

    Step 5: Visualize hidden layer units

    Finally, the DISPLAY_NETWORK.M is called to visualize the hidden layer elements and coexist in the weights.jpg file.

    W1 = Reshape (Opttheta (1:hiddensize*visiblesize), hiddensize, visiblesize);

    Display_network (W1 ', 12);

    print -djpeg weights.jpg % Save the visualization to a file /c9>

    What does the weighted image of the experimental results represent? If the input feature satisfies the constraint that the two-dimensional number is less than 1, it satisfies:

    Then it can be proved that only when each dimension of the input x satisfies: the active of the hidden layer is the largest, that is, the node output of the hidden layer is 1, it can be seen that the input value and weight value should be positive correlation.

    4. Experimental results and operating environment

    Experimental results

    The gradient check results in a difference of 7.0949e-11, much less than 1.0e-9, satisfying the condition.

    The resulting hidden layer Unit visualization image is as follows;

    We can see that hidden layer units learn higher-order features like edge of image.

    Gradient check time: 1261.874 seconds, approx. 21 minutes

    Turn off gradient check, training sample time: 85.03 seconds

    Operating Environment

    Processor: AMD a6-3420m APU with Radeon (tm) HD Graphics 1.50 GHz

    RAM:4.00GB (2.24GB available)

    Os:windows 7,32 Bit

    MATLAB:R2012B (8.0.0.783)

  2. Appendix: Actual Running Results

    ... skip some results

    >> Train

    Iteration funevals Step Length Function Val Opt Cond

    1 4 7.63802e-02) 8.26408e+00 2.99753e+02

    2 5 1.00000e+00) 4.00717e+00 1.59412e+02

    3 6 1.00000e+00) 1.63622e+00 6.87329e+01

    4 7 1.00000e+00) 8.46885e-01 3.03970e+01

    5 8 1.00000e+00) 5.82961e-01 1.13785e+01

    6 9 1.00000e+00) 5.27282e-01 3.28861e+00

    7 1.00000e+00 5.21369e-01 5.66333e-01

    8 1.00000e+00 5.21182e-01 6.87621e-02

    9 1.00000e+00 5.21174e-01 5.95455e-02

    1.00000e+00 5.21153e-01 1.00395e-01

    1.00000e+00 5.21136e-01 8.79291e-02

    1.00000e+00 5.21108e-01 8.22846e-02

    1.00000e+00 5.21027e-01 1.21261e-01

    ....

    395 410 1.00000e+00 4.46807e-01 4.30329e-02

    396 411 1.00000e+00 4.46794e-01 5.90697e-02

    397 412 1.00000e+00 4.46780e-01 6.49777e-02

    398 413 1.00000e+00 4.46768e-01 4.46670e-02

    399 414 1.00000e+00 4.46761e-01 2.51915e-02

    415 1.00000e+00 4.46758e-01 2.03033e-02

    Exceeded Maximum number of iterations

    >>

UFLDL Experimental Report 2:sparse Autoencoder

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.