Preface:
This experiment is mainly used to practice the implementation of soft-taught learning. Reference: http://deeplearning.stanford.edu/wiki/index.php/exercise:self-taught_learning. Soft-taught leaning uses unsupervised learning to learn feature extraction parameters, and then uses supervised learning to train classifiers. Sparse autoencoder and softmax regression are used here. The data of the experiment is still the mnist dataset of the handwritten digital database.
Lab basics:
From the previous knowledge, we can know that the sparse autoencoder output should be of the same size as the input data and be very similar, how can we extract feature vectors from the sparse autoencoder model we have trained? In fact, the expression of the input sample extracted from the sparse code is the output of the hidden layer. First, let's take a look at the previous classic sparse code model, as shown in:
After the output layer is removed, the value of the hidden layer is the feature value we need, as shown in:
We can see from the tutorial that there are two points in unsupervised learning that need special attention: self-taught learning and semi-supervised learning. Self-taught learning is completely unsupervised. The tutorial provides an example to illustrate this problem. For example, we need to design a system to classify cars and motorcycles. If the training sample images we provide are randomly downloaded from the natural world (that is to say, these images may have cars and motorcycles, but may not have them in most cases ), if these samples are used for the feature model, the method at this time is called self-taught learning. If the sample images we train are car and motorcycle images, but we don't know which image corresponds to which car, that is, there is no labeling, the method at this time cannot be called a strict unsupervised feature, it can only be called semi-supervised learning.
Some MATLAB functions:
Numel:
For example, n = numel (a) indicates the number of elements in matrix.
Unique:
Unique is used to locate non-repeating elements in the vector and sort the elements before output.
Experiment results:
Use numbers 5 ~ Sample 9 for unsupervised training. The sparse autoencoder method is used to extract the weights of the data. The weights are converted to the following image:
However, this experiment mainly involves 0 ~ 4. Although unsupervised training is performed on the classification of these five numbers ~ 9 training samples, which will not affect the subsequent results. The classifier design follows softmax regression, so it is supervised. Finally, according to the results on the official website, the accuracy is 98%, and the design of classifier directly using the original pixels not only requires poor performance (only 96%), but also slows down the training speed.
Code of the experiment:
Stlexercise. M:
% Cs294a/cs294w self-taught learning exercise % instructions % ------------ % This file contains code that helps you get started on the % self-taught learning. you will need to complete code in feedforwardautoencoder. M % You will also need to have implemented sparseautoencodercost. m and % softmaxcost. m from previous exercises. % ============================================== =======================================% Step 0: here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below. inputsize = 28*28; numlabels = 5; hiddensize = 200; sparsityparam = 0.1; % desired average activation of the hidden units. % (this was denoted by the Greek alphabet ROV, which looks like a lower-case "P", % in the lecture notes ). lamb DA = 3e-3; % weight decay parameter Beta = 3; % weight of sparsity penalty term maxiter = 400; % ===================================================== ===================================% Step 1: load data from the mnist database % This loads our training and test data from the mnist database files. % We have sorted the data for you in this so that you will not have to % change it. % load mnist database filesmnistdata = Loadmnistimages ('Train-images. idx3-ubyte '); mnistlabels = loadmnistlabels ('Train-labels. idx1-ubyte '); % set unlabeled set (all images) % simulate a labeled and unlabeled setlabeledset = find (mnistlabels> = 0 & mnistlabels <= 4 ); unlabeledset = find (mnistlabels> = 5); % add a line of code unlabeledset = unlabeledset (1: end/3); numtest = round (numel (labeledset)/2 ); % Take half of the samples to train % numtrain = round (numel (labeledset)/3); tra Inset = labeledset (1: numtrain); testset = labeledset (numtrain + * numtrain); unlabeleddata = mnistdata (:, unlabeledset ); % Why are these two sentences even wrong? % Pack; traindata = mnistdata (:, trainset); trainlabels = mnistlabels (trainset) '+ 1; % shift labels to the range 1-5% mnistdata2 = mnistdata; testdata = mnistdata (:, testset); testlabels = mnistlabels (testset) '+ 1; % shift labels to the range 1-5% output some statisticsfprintf (' # examples in unlabeled set: % d \ n', size (unlabeleddata, 2); fprintf ('# examples in supervised training set: % d \ n \ n', size (traindata, 2 )); fprintf ('# examples in supervised testing set: % d \ n \ n', size (testdata, 2 )); % ===================================================== ===================================% Step 2: train the sparse autoencoder % This trains the sparse autoencoder on the unlabeled training % images. % randomly initialize the parameterstheta = initializeparameters (hiddensize, inputsize); % Your your code here failed % find opttheta by running the sparse autoencoder on % login = Theta; addpath minfunc/options. method = 'lbfg'; options. maxiter = 400; options. display = 'on'; [opttheta, loss] = minfunc (@ (p) sparseautoencoderloss (p ,... inputsize, hiddensize ,... lambda, sparsityparam ,... beta, unlabeleddata ),... theta, options); % percent % visualize weightsw1 = reshape (opttheta (1: hiddensize * inputsize), hiddensize, inputsize); display_network (W1 '); % ===================================================== =======================================%% Step 3: extract features from the supervised dataset % you need to complete the code in feedforwardautoencoder. m so that the % following command will extract features from the data. trainfeatures = feedforwardautoencoder (opttheta, hiddensize, inputsize ,... traindata); testfeatures = feedforwardautoencoder (opttheta, hiddensize, inputsize ,... testdata ); % ===================================================== =====================================%% Step 4: train the softmax classifiersoftmaxmodel = struct; % ----------------- your code here ---------------------- % use softmaxtrain. m from the previous exercise to train a multi-class % classifier. % use Lambda = 1e-4 for the weight regularization for softmaxlambda = 1e-4; inputsize = hiddensize; numclasses = numel (unique (trainlabels )); % unique is used to locate non-repeating elements in the vector and sort them. maxiter = 100; softmaxmodel = softmaxtrain (inputsize, numclasses, lambda ,... trainfeatures, trainlabels, options ); % --------------------------------------------------- % ================================ ===========================================================%% % Step 5: testing % ----------------- your code here %compute predictions on the test set (testfeatures) using softmaxpredict % and softmaxmodel [Pred] = softmaxpredict (softmaxmodel, testfeatures ); % --------------------------------------------------- % classification scorefprintf ('test accuracy: % F % \ n', 100 * mean (PRED (:) = testlabels (:))); % (note that we shift the labels by 1, so that digit 0 now corresponds to % label 1) % accuracy is the proportion of correctly classified images % The results for our implementation was: % accuracy: 98.3% %
Feedforwardautoencoder. M:
function [activation] = feedForwardAutoencoder(theta, hiddenSize, visibleSize, data)% theta: trained weights from the autoencoder% visibleSize: the number of input units (probably 64) % hiddenSize: the number of hidden units (probably 25) % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this % follows the notation convention of the lecture notes. W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);%% ---------- YOUR CODE HERE --------------------------------------% Instructions: Compute the activation of the hidden layer for the Sparse Autoencoder.activation = sigmoid(W1*data+repmat(b1,[1,size(data,2)]));%-------------------------------------------------------------------end%-------------------------------------------------------------------% Here's an implementation of the sigmoid function, which you may find useful% in your computation of the costs and the gradients. This inputs a (row or% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x) sigm = 1 ./ (1 + exp(-x));end
References:
Http://deeplearning.stanford.edu/wiki/index.php/Exercise:Self-Taught_Learning
Mnist Dataset