Exercise:learning color features with Sparse autoencoders
Exercise Link:exercise:learning color features with Sparse autoencoders
Sparseautoencoderlinearcost.m
function [Cost,grad] =Sparseautoencoderlinearcost (Theta, Visiblesize, hiddensize, ... lambda, sparsityparam, beta, data)% visiblesize:the number of input units (probably -)% hiddensize:the number of hidden units (probably -)%lambda:weight Decay parameter% sparsityparam:the desired average activation forThe Hidden units (denotedinchThe lecture% notes by the Greek alphabet Rho, which looks like a lower- Case "P").%beta:weight of sparsity penalty term% Data:our 64x10000 matrix containing the training data. So, data (:, i) isThe I-th training example.% The input theta isa vector (because Minfunc expects the parameters to be a vector).% We First convert theta to the (W1, W2, B1, B2) matrix/vector format, so This%follows the notation convention of the lecture notes.% W1 (i,j) denotes the weight fromj_th nodeinchinput layer to i_th node%inchHidden layer. Thus it isA hiddensize*visiblesize matrixW1= Reshape (Theta (1: hiddensize*visiblesize), hiddensize, visiblesize);% W2 (i,j) denotes the weight fromj_th nodeinchhidden layer to i_th node%inchOutput layer. Thus it isA visiblesize*hiddensize matrixW2= Reshape (Theta (hiddensize*visiblesize+1:2*hiddensize*visiblesize), visiblesize, hiddensize);% B1 (i) denotes the i_th biasinchInput layer to i_th nodeinchhidden layer.% Thus It isA hiddensize*1VECTORB1= Theta (2*hiddensize*visiblesize+1:2*hiddensize*visiblesize+hiddensize);% B2 (i) denotes the i_th biasinchHidden layer to i_th nodeinchoutput layer.% Thus It isA visiblesize*1VECTORB2= Theta (2*hiddensize*visiblesize+hiddensize+1: End);Percent----------YOUR CODE here--------------------------------------% instructions:compute the cost/optimization objecti ve J_sparse (w,b) forthe Sparse Autoencoder,%and the corresponding gradients W1grad, W2grad, B1grad, B2grad.Percent W1grad, W2grad, B1grad and B2grad should be computedusingbackpropagation.% Note that W1grad have the same dimensions asW1, B1grad has the same dimensions% asB1, etc. Your Code shouldSetW1grad to be thePartialderivative of J_sparse (w,b) with% respect to W1. i.e., W1grad (I,J) should be thePartialderivative of J_sparse (w,b)%With respect to the input parameter W1 (I,J). Thus, W1grad should is equal to the term% [(1/m) \delta w^{(1)} + \lambda w^{(1)}]inchThe last block of Pseudo-codeinchSection2.2% of the lecture notes (and similarly forW2grad, B1grad, B2grad).Percent stated differently,ifWe wereusingbatch gradient descent to optimize the parameters,% The gradient descent update to W1 would is W1: = W1-alpha * W1grad, and similarly forW2, B1, B2.%%1. Set \delta w^{(1)}, \delta b^{(1)} to0 forAll layer L%Cost and gradient variables (your code needs to compute these values).%Here , we initialize them to zeros. W1grad=Zeros (Size (W1)); W2grad=Zeros (Size (W2)); B1grad=Zeros (Size (B1)); B2grad=Zeros (Size (B2)); M= Size (data,2);%for small data, save activation information during computing rho% 2a. Use BackPropagation to Compute diff (J_sparse (w,b;x,y), w^{(1)})% and diff (J_sparse (w,b;x,y), b^{(1)})% 2a.1. Perform a Feedforward pass, computing the activations for%hidden layer and output layer.% Z2 isA hiddensize*m MATRIXZ2= W1*data + Repmat (B1,1, m);% A2 isA hiddensize*m Matrixa2=sigmoid (z2);% Z3 isA visiblesize*m matrixz3= W2*a2 + Repmat (B2,1, m);% A3 isA visiblesize*m Matrixa3=Z3;% Rho isA hiddensize*1Vectorrho= SUM (A2,2); Rho= Rho./m;% Klterm isA hiddensize*1Vectorklterm= beta* (-sparsityparam./Rho + (1-sparsityparam)./(1-rho));%accumulate the Costcost=1/2* SUM (SUM (DATA-A3). * (data-( A3) ));% 2a.2. For the output layer,Setdelta3% DELTA3 isA visiblesize*m matrixdelta3=-(data-A3);% 2a.3. For the hidden layer,SetDelta2% DELTA2 isA hiddensize*m Matrixdelta2= (W2'*delta3 + repmat (klterm,1,m)). * Sigmoiddiff (z2);% 2a.4. Compute the desiredPartialderivatives% Jw1diff isA hiddensize*visiblesize Matrixjw1diff= Delta2 * Data';% Jb1diff isA hiddensize*m Matrixjb1diff=Delta2;% Jw2diff isA visiblesize*hiddensize Matrixjw2diff= delta3 * A2';% Jb1diff isA visiblesize*m Matrixjb2diff=delta3;% 2b. Update \delta w^{(1)}w1grad= W1grad +Jw1diff; W2grad= W2grad +Jw2diff;% 2c. Update \delta b^{(1)}b1grad= B1grad + sum (Jb1diff,2); B2grad= B2grad + sum (Jb2diff,2);%Compute KL Penalty termklpen= Beta * SUM (sparsityparam*log (sparsityparam./rho) + (1-sparsityparam) *log ((1-sparsityparam)./(1-rho ));%Compute weight Decay termtempW1= W1. *w1;tempw2= W2. *W2; WD= (lambda/2) * (SUM (sum (tempW1)) +sum (sum (tempW2));= Cost./m + WD +Klpen; W1grad= W1grad./m + Lambda. *W1; W2grad= W2grad./m + Lambda. *W2;b1grad= B1grad./M;b2grad= B2grad./m;%-------------------------------------------------------------------%3. Update the Parametersafter computing the cost and gradient, we 'll% convert the gradients back to a vector format (suitable forminfunc).%Specifically, we'll unroll your gradient matrices into a vector.grad=[W1grad (:); W2grad (:); B1grad (:); B2grad (:)];end%-------------------------------------------------------------------% here's an implementation of the sigmoid function, which your may find useful%inchyour computation of the costs and the gradients. This inputs a (row or%column) vector (Say (Z1, Z2, Z3)) and returns (F (Z1), F (Z2), F (Z3)). function Sigm=sigmoid (x) Sigm=1./ (1+ EXP (-x)); End%define the differential of sigmoidfunction Sigmdiff=Sigmoiddiff (x) Sigmdiff= sigmoid (x). * (1-sigmoid (x)); End
Results:
If running out is like this, may be a3 = z3 written a3 = sigmoid (Z3)
"Deeplearning" Exercise:learning color features with Sparse autoencoders