First, Softmax
the meaning of the Softmax model is to assume that the posterior probability P (y|x) obeys the polynomial distribution, y=1,2,3,4,.., K, that is, the K class, according to the polynomial distribution (n=1, also known as the directory distribution) definition:
second, derive the Softmax model from the generalized linear model
Our goal is to give X, to find out the parameter phi, we need to set up the model of the parameter Phi X, the derivation of the model is given below.
In the following, we write the posterior probabilities in the form of exponential function families to derive
iii. optimization functions and gradientsNow that we have established the model of the parametric Phi pair X, we need to estimate the value of the parameter theta and use the maximum likelihood estimate.
The following solves the gradient:
Iv. Regular PunishmentIn order to make the target function strictly convex function there is a unique minimum value, and then add a weight penalty, to obtain a new objective function and gradient:
Five, MATLAB experimentThe experimental data was used in the Mnist database to identify 10 handwritten digits.
Percent cs294a/cs294w Softmax Exercise% instructions%------------% This file contains code that helps get start Ed on the% Softmax exercise. You'll need to write the Softmax cost function% of SOFTMAXCOST.M and the Softmax prediction function in SOFTMAXPRED.M
. % for the exercise, you'll not be need to the change of any of the code in the this file, the% or any other files and other than those mentioned
Above. % (However, you could be required to doing so in later exercises)%%========================================================= ============= percent STEP 0:initialise constants and parameters% here we define and initialise some constants which allow
Your code% to is used more generally on any arbitrary input.
% We also initialise some parameters used for tuning the model. Inputsize = 28 * 28; % Size of input vector (MNIST images is 28x28) numclasses = 10; % Number of classes (MNIST images fall into classes) lambda = 1e-4; % Weight Decay parameter%%====================================================================== percent STEP 1:load data% in this section, we Load the input and output data.
% for Softmax regression in MNIST pixels,% The input data is the images, and the output data is the labels.
% change the filenames if you've saved the files under different names% on some platforms, the files might be saved as
% train-images.idx3-ubyte/train-labels.idx1-ubyte images = loadmnistimages (' Mnist/train-images-idx3-ubyte ');
Labels = loadmnistlabels (' Mnist/train-labels-idx1-ubyte '); Labels (labels==0) = 10;
% Remap 0 to ten inputdata = images;
% for debugging purposes, your may wish to reduce the size of the input data% in order to speed up gradient checking. % here, we create synthetic dataset using the random data for testing% DEBUG = true;
% Set DEBUG to True when debugging.
% if DEBUG% inputsize = 8;
% Inputdata = RANDN (8, 100);
% labels = Randi (10, 100, 1); % end% randomly initialise theta theta = 0.005 * RANDN (numclasses* Inputsize, 1); %%======================================================================% STEP 2:implement softmaxCost%
Implement Softmaxcost in SOFTMAXCOST.M.
[Cost, Grad] = Softmaxcost (theta, numclasses, inputsize, Lambda, inputdata, labels); %%======================================================================% STEP 3:gradient checking% as with any L
Earning algorithm, you should always check for that your% gradients is correct before learning the parameters. % if DEBUG% Numgrad = computenumericalgradient (@ (x) softmaxcost (x, numclasses, ...%
Inputsize, Lambda, inputdata, labels), theta);
%% Use this to visually compare the gradients side by side% disp ([Numgrad grad]); %% Compare numerically computed gradients with those computed analytically% diff = norm (numgrad-grad)/norm (nu
Mgrad+grad);
% disp (diff);
% The difference should be small. % % In our implementation, these values is usually less than 1e-7.
%% when your gradients is correct, congratulations! % end%%====================================================================== percent STEP 4:learning parameters% Once y OU has verified that your gradients is correct,% you can start training your Softmax regression code using Softmaxtra
In% (which uses minfunc).
Options.maxiter = 100;
Softmaxmodel = Softmaxtrain (inputsize, numclasses, lambda, ... inputdata, labels, options); % Although we only use iterations here to train a classifier for the percent MNIST data set, in P
Ractice, training for more iterations is usually% beneficial. %%====================================================================== percent STEP 5:testing% should now test your
Model against the test images. % to does this, you'll first need to write softmaxpredict% (in softmaxpredict.m), which should return predictions% Given a softmax model and the input data.
Images = loadmnistimages (' Mnist/t10k-images-idx3-ubyte ');
Labels = loadmnistlabels (' Mnist/t10k-labels-idx1-ubyte '); Labels (labels==0) = 10;
% Remap 0 to ten inputdata = images;
% you'll have to implement Softmaxpredict in SOFTMAXPREDICT.M [pred] = Softmaxpredict (Softmaxmodel, inputdata);
ACC = mean (labels (:) = = pred (:)); fprintf (' Accuracy:%0.3f%%\n ', ACC * 100); % accuracy is the proportion of correctly classified images percent after iterations, the results for our implementation wer E:% accuracy:92.200%% If your values is too low (accuracy less than 0.91), your should check% your code for error s, and make sure is training on the% entire data set of 60000 28x28 training images% (unless you modified the LOA
Ding code, this should is the case)
function [Cost, Grad] = Softmaxcost (theta, numclasses, inputsize, lambda, data, labels)% numclasses-the number of Clas Ses% inputsize-the size N of the input vector% lambda-weight decay parameter% data-the N x M input matrix, where Each column of data (:, i) corresponds to% a single test set% Labels-an M x 1 matrix containing the labels Corresp
Onding for the input data% unroll the parameters from theta theta = reshape (theta, numclasses, inputsize);
numcases = Size (data, 2);
Groundtruth = Full (sparse (labels, 1:numcases, 1));
Cost = 0;
Thetagrad = Zeros (numclasses, inputsize); Percent----------YOUR CODE here--------------------------------------% instructions:compute the cost and gradient for SOF
Tmax regression.
% need to compute thetagrad and cost.
% The Groundtruth matrix might come in handy.
[N,m]=size (data);
Eta=bsxfun (@minus, Theta*data,max (theta*data,[],1));
Eta=exp (ETA);
Pij=bsxfun (@rdivide, Eta,sum (ETA)); Cost=-1./m*suM (Sum (Groundtruth.*log (PIJ))) +lambda/2*sum (sum (theta.^2));
thetagrad=-1/m.* (Groundtruth-pij) *data ' +lambda.*theta; %------------------------------------------------------------------% unroll the gradient matrices into a vector for min
Func grad = [Thetagrad (:)]; end
function [Softmaxmodel] = Softmaxtrain (inputsize, numclasses, Lambda, inputdata, labels, options)%softmaxtrain Train a so Ftmax model with the given parameters on the given% data.
Returns Softmaxopttheta, a vector containing the trained parameters% for the model. % inputsize:the size of an input vector x^ (i)% numclasses:the number of classes% lambda:weight decay parameter% i Nputdata:an N by M matrix containing the input data, such a inputdata (:, c) is the CTH input% labels:m By 1 matrix containing, the class labels for the% corresponding inputs. Labels (c) is the class label for% the CTH input% options (optional): Options% Options.maxIter:number of I
Terations to train for if ~exist (' Options ', ' var ') options = struct;
End If ~isfield (options, ' maxiter ') Options.maxiter = 400;
End% Initialize Parameters theta = 0.005 * RANDN (numclasses * inputsize, 1); % use Minfunc to minimize the function Addpath minfunc/options. MetHod = ' Lbfgs '; Here, we use the L-BFGS to optimize our cost% function.
Generally, for Minfunc to work, you% need a function pointer with a outputs:the % function value and the gradient.
In we problem,% softmaxcost.m satisfies this.
Minfuncoptions.display = ' on '; [Softmaxopttheta, Cost] = Minfunc (@ (P) softmaxcost (p, ... numclasses, inputsize, LAMBD
A, ... inputdata, labels), ...
theta, options);
% Fold Softmaxopttheta into a nicer format Softmaxmodel.opttheta = reshape (Softmaxopttheta, numclasses, inputsize);
Softmaxmodel.inputsize = inputsize;
softmaxmodel.numclasses = numclasses;
End
function [pred] = softmaxpredict (Softmaxmodel, data)
% Softmaxmodel-model trained using Softmaxtrain
% data-th e N x M input matrix, where each column of data (:, i) corresponds to
% a single test set
%
Your code should Produce the prediction matrix
% pred, where pred (i) is Argmax_c P (Y (c) | x (i)).
% unroll the parameters from theta
theta = Softmaxmodel.opttheta; % This provides a numclasses x inputsize matrix
pred = zeros (1, size (data, 2));
Percent----------YOUR CODE here--------------------------------------
% instructions:compute pred using theta Assuming that the labels start
[Prob,pred]=max (theta*data);
%---------------------------------------------------------------------
End
To be continued ....