First, Softmaxthe meaning of the Softmax model is to assume that the posterior probability P (y|x) obeys the polynomial distribution, y=1,2,3,4,.., K, that is, the K class, according to the polynomial distribution (n=1, also known as the directory distribution) definition:
Second, derive the Softmax model from the generalized linear model
Our goal is to give X, to find out the parameter phi, we need to set up the model of the parameter Phi X, the derivation of the model is given below.
in the following, we write the posterior probabilities in the form of exponential function families to derive
optimization function and gradient now we have established the model of the parameter Phi to x, the following need to do is estimate parameter theta value, use maximum likelihood estimate.
The following solves the gradient:
Four, the regular penalty in order to make the objective function strictly convex function is a unique minimum value, and then add a weight penalty, to obtain a new objective function and gradient:
Five, MATLAB experimental data used in the Mnist database, for the identification of 10 handwritten digits.
Percent cs294a/cs294w Softmax Exercise% instructions%------------% This file contains code that helps you get started on The% Softmax exercise. You'll need to write the Softmax cost function% of SOFTMAXCOST.M and the Softmax prediction function in SOFTMAXPRED.M. % for the exercise, you'll not be need to the change any code in this file,% or any other files other than those mentioned above.% (However, you could be required to doing so in later exercises)%%===================================================== =================%% STEP 0:initialise Constants and parameters%% here we define and initialise some constants which Allo W your code% to is used more generally on any arbitrary input. % We also initialise some parameters used for tuning the Model.inputsize = 28 * 28; % Size of input vector (MNIST images is 28x28) numclasses = 10; % Number of classes (MNIST images fall into classes) lambda = 1e-4; % Weight Decay parameter%%======================================================================%% STEP 1:load data%% In this section, we Load the input and output data.% for Softmax regression on MNIST pixels,% The input data is the images, and% of the output data is the labels.%% change the filenames if you ' ve save d The files under different names% on some platforms, the files might be saved as% Train-images.idx3-ubyte/train-labels . idx1-ubyteimages = loadmnistimages (' mnist/train-images-idx3-ubyte '); labels = loadmnistlabels (' mnist/ Train-labels-idx1-ubyte '); labels (labels==0) = 10; % Remap 0 to 10inputData = images;% for debugging purposes, you could wish to reduce the size of the input data% in order to Speed up gradient checking. % here, we create synthetic dataset using the random data for testing% DEBUG = true; % Set DEBUG to True when debugging.% if debug% inputsize = 8;% Inputdata = RANDN (8, +);% labels = Randi (10, 1);% end% randomly initialise Thetatheta = 0.005 * RANDN (numclasses * inputsize, 1);%%=============================== =======================================%% STEP 2:implement softmaxcost%% Implement softmaxcost in SOFTMAXCOST.M. [Cost, Grad] = Softmaxcost (theta, numclasses, inputsize, Lambda, inputdata, labels); %%======================================================================%% STEP 3:gradient checking%% as with any Learning algorithm, you should always check that your% gradients is correct before learning the parameters.%% if debug% Numgrad = Computenumericalgradient (@ (x) softmaxcost (x, numclasses, ...% inputsiz E, Lambda, inputdata, labels), theta);%% use this to visually compare the gradients side by Side% disp ([Numgrad Grad]); %% Compare numerically computed gradients with those computed analytically% diff = norm (numgrad-grad)/norm (NUMG Rad+grad);% disp (diff); % The difference should be small. % in our implementation, these values is usually less than 1e-7.%% when your Gradients is correct, congratulations!% end%%==================================================================== ==%% STEP 4:learning parameters%% Once You has verified that your gradients is correct,% you can start training your Softmax regression code using softmaxtrain% (which uses minfunc). Options.maxiter = 100;softmaxmodel = Softmaxtrain (input Size, numclasses, lambda, ... inputdata, labels, options); % Although we only use iterations here to train a classifier for the percent MNIST data set, in practice, training for more Iterations is usually% beneficial.%%======================================================================%% STEP 5 : testing%% should now test your model against the test images.% to does this, you'll first need to write Softmaxpre dict% (in SOFTMAXPREDICT.M), which should return predictions% given a softmax model and the input data.images = Loadmnis Timages (' mnist/t10k-images-idx3-ubyte '); labels =Loadmnistlabels (' mnist/t10k-labels-idx1-ubyte '); labels (labels==0) = 10; % Remap 0 to 10inputData = images;% you'll have to implement softmaxpredict in softmaxpredict.m[pred] = Softmaxpredict (s Oftmaxmodel, inputdata); acc = mean (labels (:) = = pred (:)); fprintf (' Accuracy:%0.3f%%\n ', acc *);% accuracy is the Propo Rtion of correctly classified images% after iterations, the results for our implementation were:%% accuracy:92.200%%% If your values is too low (accuracy less than 0.91), you should check% your code for errors, and make sure is Trai Ning on the entire data set of 60000 28x28 training images percent (unless you modified the loading code, this should is the C Ase
function [Cost, Grad] = Softmaxcost (theta, numclasses, inputsize, lambda, data, labels)% numclasses-the number of Classe S% inputsize-the size N of the input vector% lambda-weight decay parameter% data-the N x M input matrix, where each Column data (:, i) corresponds to% a single test set% labels-an M x 1 matrix containing the labels corresponding For the input data%% unroll the parameters from Thetatheta = Reshape (theta, numclasses, inputsize); numcases = Size (data, 2 ); Groundtruth = Full (sparse (labels, 1:numcases, 1)); cost = 0;thetagrad = Zeros (numclasses, inputsize);----------YOUR C ODE here--------------------------------------% instructions:compute the cost and gradient for Softmax regression.% You need to compute Thetagrad and cost.% the Groundtruth matrix might come in handy. [N,m]=size (data); Eta=bsxfun (@minus, Theta*data,max (theta*data,[],1)) eta=exp (ETA);p ij=bsxfun (@rdivide, Eta,sum ( ETA)); Cost=-1./m*sum (sum (Groundtruth.*log (PIJ)))+lambda/2*sum (sum (theta.^2)); thetagrad=-1/m.* (Groundtruth-pij) *data ' +lambda.*thetagrad;%---------------------- --------------------------------------------% unroll The gradient matrices into a vector for minfuncgrad = [Thetagrad (:)] ; end
function [Softmaxmodel] = Softmaxtrain (inputsize, numclasses, Lambda, inputdata, labels, options)%softmaxtrain Train a so Ftmax model with the given parameters on the given% data. Returns Softmaxopttheta, a vector containing the trained parameters% for the model.%% inputsize:the size of an input vect or x^ (i)% numclasses:the number of classes% lambda:weight decay parameter% inputdata:an N by M matrix containing the I Nput data, such that% Inputdata (:, c) is the Cth input% labels:m by 1 matrix containing the class labels for t he% corresponding inputs. Labels (c) is the class label for% the CTH input% options (optional): options% Options.maxIter:number of ITER Ations to train Forif ~exist (' Options ', ' var ') options = Struct;endif ~isfield (options, ' maxiter ') Options.maxiter = 400;end% Initialize Parameterstheta = 0.005 * RANDN (numclasses * inputsize, 1);% use Minfunc to minimize the FUNCTIONADDP Ath Minfunc/options. Method = ' Lbfgs '; % here, we use L-bfgs to optimize we cost% function. Generally, for Minfunc to work, you% need a function pointer with a outputs:the % function value and the gradient. In we problem,% softmaxcost.m satisfies this.minFuncOptions.display = ' on '; [Softmaxopttheta, Cost] = Minfunc (@ (P) softmaxcost (p, ... numclasses, inputsize, Lambda , ... inputdata, labels), ... theta, options);% Fold Softmaxopttheta into a nicer Formatsoftmaxmodel.opttheta = Reshape (Softmaxopttheta, Numclasse s, inputsize); softmaxmodel.inputsize = inputsize;softmaxmodel.numclasses = numclasses; End
function [pred] = softmaxpredict (softmaxmodel, data)% Softmaxmodel-model trained using softmaxtrain% data-the N x M in Put matrix, where each column of data (:, i) corresponds to% a single test set%% Your code should produce the prediction m Atrix% pred, where pred (i) is Argmax_c P (Y (c) | x (i)). % unroll the parameters from thetatheta = Softmaxmodel.opttheta; % This provides a numclasses x inputsize matrixpred = zeros (1, size (data, 2)); percent----------YOUR CODE here--------------- -----------------------% instructions:compute pred using theta Assuming that the labels start [Prob,pred]=max (Theta *data);%---------------------------------------------------------------------End
To be
continued ....
Softmax principle and MATLAB implementation