recently want to study deep learning, from the beginning to see UFLDL (Unsuprisedfeature Learning and deep learning) tutorial, I put the answer to the after-class exercise here, as a note. 
Notes:
 : The self-coding algorithm is an unsupervised learning algorithm that learns  hw,b (x) = x,   so the last  outputlayer   inputlayer  Span Style= "", the number of units is equal, while the middle   0   PCA     
2 : Visual self-encoder, the problem is visualized in the exercise W1 , i.e. the parameters that need to be learned W1 . This I do not understand, and later thought, because the input is a pixel point of the image, then each hidden layer such as A1 (2) = w11x1+w12*x2+w13*x3+ ... , ~~ don't quite understand, then learn to look at the back. 
Practice answers:
1 : Sparse self-encoder
Step1 :  in the SAMPLEIMAGES.M Gets the code that generates the training set in the file, where Tic and the TOC It is used for the purpose of remembering. 
Ticimage_size=size (IMAGES); I=randi (image_size (1)-patchsize+1,1,numpatches);   % produces 1*10000 random number ranges between [1,image_size (1)-patchsize+1] J=randi (image_size (2)-patchsize+1,1,numpatches); K=randi (Image_ Size (3), 1,numpatches);              % randomly selected picture 10,000 times for num=1:numpatches        patches (:, num) =reshape (IMAGES (i (num): I (NUM) +patchsize-1,j (num): j (Num) + Patchsize-1,k (num)), 1,patchsize*patchsize); Endtoc
STEP2 :  in the SPARSEAUTOENCODERCOST.M document to complete the forward propagation and back propagation and other related code
%1.forward propagationdata_size=size (data);    % [10000]active_value2=repmat (B1,1,data_size (2));    % will B1 expand 10000 columns 25*10000active_value3=repmat (B2,1,data_size (2));  % will B2 expand 10000 columns 64*10000active_value2=sigmoid (w1*data+active_value2);   The value matrix of the% hidden node represents all samples 25*10000 a column representing a sample hidden active_value3=sigmoid (W2*ACTIVE_VALUE2+ACTIVE_VALUE3); The value matrix of the output node represents all samples 64*10000 a column represents a sample output%2.computing error term and costave_square=sum (sum (active_value3-data). ^2).   /2)/data_size (2);         %cost the first minimum squared sum weight_decay=lambda/2* (sum (SUM (w1.^2)) +sum (sum (w2.^2)));       %cost the second all parameters of the square and Bayesian school P_real=sum (active_value2,2)./data_size (2);       The estimated p in the sparse penalty term is 25 D p_para=repmat (sparsityparam,hiddensize,1);   % sparsity parameter sparsity=beta.*sum (P_para.*log (p_para./p_real) + (1-p_para). *log ((1-p_para)./(1-p_real)));      %KL diversioncost=ave_square+weight_decay+sparsity;      % final cost functiondelta3= (active_value3-data). * (ACTIVE_VALUE3). * (1-ACTIVE_VALUE3); The% error is that the 64*10000 matrix represents all samples, and each column represents a sample Average_spaRsity=repmat (sum (active_value2,2)./data_size (2), 1,data_size (2));     % the sparse term in error Default_sparsity=repmat (Sparsityparam,hiddensize,data_size (2)); % sparsity parameter sparsity_penalty=beta.* (-(default_sparsity./average_sparsity) + ((1-default_sparsity)./(1-average_sparsity
 Step3:  Gradient Test
Epsilon=0.0001;for i=1:size (theta)    Theta_plus=theta;    Theta_minu=theta;    Theta_plus (i) =theta_plus (i) +epsilon;    Theta_minu (i) =theta_minu (i)-epsilon;    Numgrad (i) = (J (theta_plus)-j (Theta_minu))/(2*epsilon); end
STEP4: Visualization, Training train.m , the relevant gradient check related code is removed, because this part of the code is more time consuming. 
2 : Vectorization Programming Implementation
This only needs to be changed slightly in the above code.
 Step1:  first set the parameter to
Visiblesize = 28*28;   % number of input units hiddensize = 196;     % number of hidden units Sparsityparam = 0.1;   % desired average activation of the hidden units.                     % (This is denoted by the Greek alphabet Rho, which looks like a lower-case "P",     % in the  Lecture notes). Lambda = 3e-3;     % weight decay parameter       beta = 3;            % weight of sparsity penalty term       
STEP2 :  in the sparse encoder, Step1 the way to get the training set is replaced by the following code:
Images = loadmnistimages (' train-images.idx3-ubyte ');d isplay_network (Images (:, 1:100)); % Show The first imagespatches = images (:, Randi (Size (images,2), 1, 10000));
This gives you the following visual results:
UFLDL Tutorial Exercise Answer one (sparse self-encoder and vectorization programming implementation)