Softmax has been entangled for two days. The reason is that you accidentally changed the main program or pasted the code as usual. If you need it, you can go to the UFLDL tutorial. The effect is the same as that of UFLDL, I won't repeat the textures. ps: the code is matlab, not python's PCA and Whitening: pca_gen.m [python] % ======================================== ============================ x = sampleIMAGESRAW (); figure ('name', 'raw images'); randsel = randi (size (x, 2),); % A random selection of samples for visualization display_network (x (:, randsel )); % ===================================================== ======================================== avg = mean (x, 2); x = x-repmat (avg, 1, size (x, 2 )); % ===================================================== ==================================== xRot = zeros (size (x )); % You need to compute this [u, s, v] = svd (x); xRot = U' * x; % ===================================================== ====================================== covar = zeros (size (x, 1); % You need to compute this covar = diag (cov (x '))); % ===================================================== ======================================= figure ('name ', 'visualisation of covariance matrix '); imagesc (covar ); % ===================================================== ========================================= k = 0; % Set k accordingly egis = eig (covar) egis = sort (egis, 'descend') for I = 1: size (covar, 1) if (sum (egis (1: i)/sum (egis)> 0.99) k = I break; end % ========================================== ==========================================%%%%============== ========================================================== ================ xHat = zeros (size (x )); % You need to compute this xHat = u * [xRot (1: k, :); zeros (size (xHat (k + 1: end, :))]; % Visualise the data, and compare it to the raw data % You shocould observe that the raw and processed data are of comparable quality. % For comparison, you may wish to generate a PCA reduced image which % retains only 90% of the variance. figure ('name', ['pca processed images ', sprintf (' (% d/% d dimensions) ', k, size (x, 1 )), '']); display_network (xHat (:, randsel); figure ('name', 'raw images '); display_network (x (:, randsel )); % ===================================================== ==========================================%% Step 4a: implement PCA with whitening and regularisation % Implement PCA with whitening and regularisation to produce the matrix % xPCAWhite. epsilon = 0.1; xPCAWhite = zeros (size (x); avg = mean (x, 1); % Compute the mean pixel intensity value separately for each patch. x = x-repmat (avg, size (x, 1), 1); sigma = x * x'/size (x, 2); [U, S, v] = svd (sigma); xRot = U' * x; % rotated version of the data. xTilde = U (:, 1: k) '* x; % reduced dimension representation of the data, % where k is the number of eigenvectors to keep xPCAWhite = diag (1. /sqrt (diag (S) + epsilon) * U' * x; % ===================================================== =======================================% Visualise the covariance matrix. you shoshould see a red line between ss the % diagonal against a blue background. covar = diag (cov (xPCAWhite '); figure ('name', 'visualisation of covariance matrix'); imagesc (covar ); % ===================================================== ==================== xZCAWhite = zeros (size (x )); xZCAWhite = U * diag (1. /sqrt (diag (S) + epsilon) * U' * x; % ===================================================== ======================================= figure ('name ', 'zca whitened images'); display_network (xZCAWhite (:, randsel); figure ('name', 'raw images '); display_network (x (:, randsel )); softmax RegressionsoftmaxCost. m [python] function [cost, grad] = softmaxCost (theta, numClasses, inputSize, lambda, data, labels) % numClasses-the number of classes % inputSize-the size N of the input vector % lambda-weight decay parameter % data-the N x M input matrix, where each column data (:, i) corresponds to % a single test set % labels-an M x 1 matrix containing the labels corresponding for the input data % Unroll the parameters from theta = reshape (theta, numClasses, inputSize); numCases = size (data, 2); groundTruth = full (sparse (labels, 1: numCases, 1); % numClasses * M cost = 0; thetagrad = zeros (numClasses, inputSize); M = theta * data; % (numClasses, N) * (N, M) M = bsxfun (@ minus, M, max (M, [], 1); h = exp (M); h = bsxfun (@ rdivide, h, sum (h )); cost =-1/numCases * sum (groundTruth. * log (h) + lambda/2 * sum (theta. ^ 2); thetagrad =-1/numCases * (groundTruth-h) * data') + lambda * theta; % log (h) the following code for the key section without Vectorization % for I = 1: numCases % s = groundTruth (:, I ). * log (h (:, I); % cost = cost + sum (s); % end % cost = cost * (-1) /numCases + lambda/2 * sum (theta. ^ 2); % for I = 1: numClasses % for j = 1: numCases % groundTruth (:, j) % h (:, j) % k = (groundTruth (:, j)-h (:, j) * data (:, j) '); % thetagrad (I, :) = thetagrad (I, :) + k (I, :); % end % thetagrad (I, :) =-thetagrad (I, :)/numCases + lambda * theta (I ,:); % end grad = [thetagrad (:)]; end