1. Independent Component Analysis Modeling
The objective of independent component analysis is to realize the complete unit orthogonal basis in the mass data, so the following optimization problems can be established:
Among them, the first is sparse constraints, the second is a complete set of orthogonal basis constraints, familiar with the sparse representation may notice, why the above optimization problem is not explicitly added constraints, because W is an orthogonal matrix, so the above error is 0.
2. Solving the optimization problem of independent component analysis
(1) Sparse constrained optimization: the optimization problem of converting to the following energy functional
The derivative is as follows:
(2) Complete unit orthogonal basis constraint optimization: The projection is realized, the projection is as follows:
Note that since our complete unit orthogonal basis is derived from data, it is necessary to compare the Zca white and projection of the data samples before the iterative training as follows:
(3) Optimization steps:
Step1: Centralized training data, ZCA white processing
Step2: Updating w According to gradient descent method
Step3: Projecting W
STEP4: Whether satisfies the convergence condition, satisfies then the end, otherwise repeats step2-3;
(4) Sample code:
Percent cs294a/cs294w independent Component analysis (ICA) Exercise% instructions%------------% This file contains C Ode that helps you get started on the% ICA exercise.
In this exercise, you'll need to modify% orthonormalicacost.m and a small part of the this file, ICAEXERCISE.M. %%====================================================================== percent STEP 0:initialization% here we
Initialize some parameters used for the exercise.
Numpatches = 20000;
Numfeatures = 121;
Imagechannels = 3;
Patchdim = 8;
Visiblesize = Patchdim * Patchdim * imagechannels;
OutputDir = '. '; Epsilon = 1e-6; % L1-regularisation Epsilon | Wx| ~ sqrt (Wx) ^2 + epsilon)%%====================================================================== percent STEP 1:sample
Patches patches = Load (' Stlsampledpatches.mat ');
Patches = Patches.patches (:, 1:numpatches);
Displaycolornetwork (Patches (:, 1:100)); %%====================================================================== percent STEP 2:zca whiten patches%In this step, we ZCA whiten the sampled patches.
This was necessary for% orthonormal ICA to work.
patches = patches/255;
Meanpatch = mean (patches, 2);
Patches = Bsxfun (@minus, Patches, meanpatch);
% data for ZCA white processing sigma = patches * patches ';
[U, S, v] = SVD (sigma);
Zcawhite = u * DIAG (1./sqrt (diag (s))) * u ';
Patches = Zcawhite * PATCHES; %%====================================================================== percent STEP 3:ica cost functions% Implement the C ost function for Orthornomal ICA (you don't have to% enforce the orthonormality constraint in the cost function)% in
The function orthonormalicacost in ORTHONORMALICACOST.M.
% Once You have implemented the function, check the gradient. % use less features and smaller patches for speed debug = false;
When the% is true, the gradient calculation is performed to verify if debug numfeatures = 5;
Patches = Patches (1:3, 1:5);
Visiblesize = 3;
Numpatches = 5;
Weightmatrix = rand (Numfeatures, visiblesize); [Cost, Grad] = Orthonormalicacost (Weightmatrix, Visiblesize, NUMFEAtures, patches, epsilon); Numgrad = Computenumericalgradient (@ (x) orthonormalicacost (x, Visiblesize, numfeatures, Patches, epsilon),
Weightmatrix (:));
% uncomment to display the numeric and analytic gradients side-by-side% disp ([Numgrad grad]);
diff = Norm (numgrad-grad)/norm (Numgrad+grad);
fprintf (' orthonormal ICA difference:%g\n ', diff); Assert (diff < 1e-7, ' difference too large.
Check your analytic gradients. '); fprintf (' congratulations!
Your gradients seem okay.\n '); End%%====================================================================== percent STEP 4:optimization for orthonormal Ica% Optimize for the orthonormal ICA objective, enforcing the orthonormality% constraint.
Code have been provided to does the gradient descent with a% backtracking line search using the Orthonormalicacost function
% (For more information on backtracking line search, you can read the% appendix of the exercise).
% However, you'll need to write code to enforce the Orthonormality% constraint by projecting Weightmatrix back into the space of matrices% satisfying ww^t = I.% Once you were done, You can run the code. 10000 iterations of gradient% descent would take around 2 hours, and only a few bases would be% completely learned withi N 10000 iterations. This highlights one of the% weaknesses of orthonormal ica-it are difficult to optimize for the% objective function wh
Ile enforcing the orthonormality constraint-% convergence using gradient descent and projection is very slow.
Weightmatrix = rand (Numfeatures, visiblesize); [Cost, Grad] = Orthonormalicacost (Weightmatrix (:), visiblesize, numfeatures, patches, epsilon);% compute gradient fprintf ('%11s%16s
%10s\n ', ' iteration ', ' cost ', ' t ');
StartTime = Tic ();
% Initialize Some parameters for the backtracking line search alpha = 0.5;
t = 0.02;
Lastcost = 1e40; % do 10000 iterations of gradient descent for iteration = 1:50000 Grad = reshape (grad, size (we
Ightmatrix)); Newcost = Inf
Lineardelta = SUM (SUM (grad. * grad)); % Perform the backtracking line search while 1 Considerweightmatrix = Weightmatrix-alpha * GRAD; % Gradient Descent method update W%--------------------YOUR CODE here--------------------% instructions:% Write cod
E to project Considerweightmatrix back into the space% of matrices satisfying ww^t = I. % Once that was done, verify that your projection was correct by% using the checking code below.
After you has verified your% code, comment out the checking code before running the optimization. % Project Considerweightmatrix such that it satisfies ww^t = I percent error (' Fill in the ' code for th
E projection here '); Considerweightmatrix = (Considerweightmatrix*considerweightmatrix ') ^ ( -0.5) *considerweightmatrix; % W projection% Verify that the projection is correct temp = Considerweightmatrix * ConSiderweightmatrix ';
temp = Temp-eye (numfeatures); ASSERT (SUM (temp (:). ^2) < 1e-23, ' Considerweightmatrix does not satisfy ww^t = I. Check your projection again ');% Error (' Projection seems okay.
Comment out verification code before running optimization. '); %--------------------YOUR CODE here--------------------[Newcost, NEWG
RAD] = Orthonormalicacost (Considerweightmatrix (:), visiblesize, numfeatures, patches, epsilon); If newcost > Lastcost-alpha * t * Lineardelta% fprintf ('%14.6f%14.6f\n ', Newcost, Lastcost-alpha
* t * Lineardelta);
t = 0.9 * t;
else break;
End end lastcost = Newcost;
Weightmatrix = Considerweightmatrix;
fprintf ('%9d%14.6f%8.7g\n ', iteration, Newcost, T);
t = 1.1 * t;
Cost = Newcost;
Grad = Newgrad; % visualize the learned bases as WE go along if mod (iteration, +) = = 0 Duration = TOC (startTime); % visualize the learned bases over time on different figures so percent we can get a feel for the slow rate of converge
NCE figure (Floor (iteration/1000));
Displaycolornetwork (Weightmatrix '); End end% visualize the learned bases displaycolornetwork (Weightmatrix ');
% function [Cost, Grad] = Orthonormalicacost (theta, visiblesize, numfeatures, patches, epsilon)%%orthonormalicacost-co Mpute the cost and gradients for orthonormal ICA% (i.e. compute the cost | | wx| |
_1 and its gradient)% Weightmatrix = Reshape (theta, numfeatures, visiblesize);
% Cost = 0;
% Grad = zeros (numfeatures, visiblesize); %%--------------------YOUR code here--------------------% instructions:% Write code to Comp
Ute the cost and gradient with respect to the% of weights given in Weightmatrix. %--------------------YOUR CODE here--------------------% num_samples = size (patches,2);
% of samples% Aux1 = sqrt (((weightmatrix*patches). ^2) + epsilon); % cost = SUM (Aux1 (:))/num_samples;% Grad = ((weightmatrix*patches)./aux1) *patches './num_samples;% Grad = Gra D (:);% end% function [Cost, Grad] = Orthonormalicacost (theta, visiblesize, numfeatures, patches, epsilon)%orthonormalicacost-compute the cost and gradients for orthonormal ICA% (i.e. compute the cost | | wx| |
_1 and its gradient) Weightmatrix = reshape (theta, numfeatures, visiblesize);
Cost = 0;
Grad = zeros (numfeatures, visiblesize); %--------------------YOUR code here--------------------% instructions:% Write code to compute the cost and
Gradient with respect to the% of weights given in Weightmatrix. %--------------------YOUR CODE Here--------------------percent method lambda = 8e-6;% 0.5e-4 num_samples = siz
E (patches,2);
Cost_part1 = SUM (sum ((Weightmatrix ' *weightmatrix*patches-patches). ^2))./num_samples;
Cost_part2 = SUM (sum (sqrt ((weightmatrix*patches). ^2+epsilon)) *lambda;
Cost = Cost_part1 + Cost_part2;
Grad = (2*weightmatrix* (Weightmatrix ' *weightmatrix*patches-patches) *patches ' + ... 2*weightmatrix*patches* (Weightmatrix ' *weightmatrix*patches-patches) ')./num_samples+...
(Weightmatrix*patches./sqrt ((weightmatrix*patches). ^2+epsilon)) *patches ' *lambda;
Grad = Grad (:);
fprintf ('%11s%16s\n ', ' cost_part1 ', ' cost_part2 ');
fprintf ('%14.6f%14.6f\n ', Cost_part1, Cost_part2); End
Note: When calculating gradients, the | | W ' wx-x| | The derivation of the ^2.
function Numgrad = Computenumericalgradient (j, theta)% Numgrad = Computenumericalgradient (j, theta)% theta:a vector of Parameters% J:a function that outputs a real-number.
Calling y = J (theta) would return the% function value at theta.
% Initialize Numgrad with zeros Numgrad = zeros (Size (theta)); Percent----------YOUR CODE here--------------------------------------% instructions:% Implement numerical gradient Checki
Ng, and return the result in Numgrad.
% (see section 2.3 of the lecture notes.) % should write code so that Numgrad (i) was (the numerical approximation to) the% partial derivative of J with respect
To the i-th input argument, evaluated at Theta.
% i.e., Numgrad (i) should is the (approximately) the partial derivative of J with% respect to theta (i).
% hint:you would probably want to compute the elements of Numgrad one at a time.
Epsilon = 1e-8;
n = size (theta,1);
E = Eye (n);
For i = 1:n Delta = E (:, i) *epsilon; Numgrad (i) = (J (tHeta+delta)-j (Theta-delta))/(epsilon*2.0);
End---------------------------------------------------------------End
function Displaycolornetwork (A)% display receptive field (s) or basis vector (s) for image patches% A the basis
, with patches as column vectors% of case the midpoint are not set at 0, we shift it dynamically if min (A (:)) >= 0
A = A-mean (A (:)); End cols = Round (sqrt (Size (A, 2));
channel_size = Size (a,1)/3;
Dim = sqrt (Channel_size);
Dimp = dim+1;
rows = ceil (Size (a,2)/cols);
B = A (1:channel_size,:);
C = A (channel_size+1:channel_size*2,:);
D = A (2*channel_size+1:channel_size*3,:);
B=b./(Ones (Size (b,1), 1) *max (ABS (B)));
C=c./(Ones (Size (c,1), 1) *max (ABS (C)));
D=d./(Ones (Size (d,1), 1) *max (ABS (D)));
% initialization of the image I = ones (dim*rows+rows-1,dim*cols+cols-1,3); %transfer features to this image matrix for i=0:rows-1 for j=0:cols-1 if i*cols+j+1 > size (b, 2) B Reak end% This sets the patch I (i*dimp+1:i*dimp+dim,j*dimp+1:j*dimp+dim,1) = ... reshape (B (:, i*
Cols+j+1), [Dim dim]); I (i*dimp+1:i*dimp+dim,j*dimp+1:j*dimp+dim,2) = Reshape (C (:, I*cols+j+1), [Dim dim]);
I (i*dimp+1:i*dimp+dim,j*dimp+1:j*dimp+dim,3) = ... reshape (D (:, I*cols+j+1), [Dim dim]);
End end i = i + 1;
I = I/2;
Imagesc (I);
Axis equal axis off end
The first 100 samples are shown below:
Complete unit orthogonal basis for training when iterating 10,000 times
Reference: http://www.cnblogs.com/dmzhuo/p/5009274.html
Http://deeplearning.stanford.edu/wiki/index.php/Exercise:Independent_Component_Analysis