Partial Least Squares MATLAB program
Function:
Function [T, P, U, Q, B, w] = PLS (X, Y, tol2)
% PLS partial least squares regrassion
%
% [T, P, U, Q, B, q] = PLS (X, Y, Tol) performs participant Least Squares regrassion
% Between the independent variables, X and dependent y
% X = T * P' + E;
% Y = u * Q' + F = T * B * Q' + F1;
%
% Inputs:
% X Data Matrix of Independent Variables
% Y data matrix of dependent variables
% Tol the tolerant of convergence (defaut 1e-10)
%
% Outputs:
% T score matrix of X
% P loading matrix of X
% U score matrix of Y
% Q loading matrix of Y
% B Matrix of Regression Coefficient
% W weight matrix of X
%
% Using the PLS model, for new X1, Y1 can be predicted
% Y1 = (x1 * P) * B * Q' = x1 * (p * B * Q ')
% Or
% Y1 = x1 * (w * inv (P' * W) * inv (t' * t) * t' * Y)
%
% Without y provided, the function will return the principal components
% X = T * P' + E
%
% Example: taken from geladi, P. and Kowalski, B. R., "an example of 2-block
% Predictive partial least-squares regression with simulated data ",
% Analytica Chemica ACTA, 185 (1996) 19--32.
% {
X = [4 9 6 7 8 3 2; 6 15 10 15 17 22 9 4; 8 21 14 23 27 36 15 6;
10 21 14 13 11 10 3 4; 12 27 18 21 21 24 9 6; 14 33 22 29 31 38 15 8;
16 33 22 19 15 12 3 6; 18 39 26 27 25 26 9 8; 20 45 30 35 40 15 10];
Y = [1 1; 3 1; 5 1; 1 3; 3 3; 5 3; 1 5; 3 5; 5];
% Leave the last sample for test
N = size (x, 1 );
X1 = x (1: N-1 ,:);
Y1 = Y (1: N-1 ,:);
X2 = x (n ,:);
Y2 = Y (n ,:);
% Normalization
Xmean = mean (X1 );
Xstd = STD (X1 );
Ymean = mean (Y1 );
Ystd = STD (y );
X = (x1-xmean (ones (N-1, 1), :))./xstd (ones (N-1, 1 ),:);
Y = (y1-ymean (ones (N-1, 1), :))./ystd (ones (N-1, 1 ),:);
% PLS Model
[T, P, U, Q, B, w] = PLS (x, y );
% Prediction and error
Yp = (x2-xmean)./xstd * (p * B * Q ');
Fprintf ('prediction error: % G/N', norm (YP-(y2-ymean)./ystd ));
%}
%
% By Yi Cao at Cranfield University on 2nd Febuary 2008
%
% Reference:
% Geladi, P and Kowalski, B .r., "partial least-squares regression:
% Tutorial ", Analytica chimica ACTA, 185 (1986) 1--7.
%
% Input check
Error (nargchk (1, 3, nargin ));
Error (nargoutchk (0, 6, nargout ));
If nargin <2
Y = X;
End
Tol = 1e-10;
If nargin <3
Tol2 = 1e-10;
End
% Size of X and Y
[RX, CX] = size (X );
[Ry, Cy] = size (y );
Assert (RX = ry, 'sizes of X and Y mismatch .');
% Allocate memory to the maximum size
N = max (CX, CY );
T = zeros (RX, N );
P = zeros (CX, N );
U = zeros (ry, N );
Q = zeros (CY, N );
B = zeros (N, N );
W = P;
K = 0;
% Iteration loop if residual is larger than specfied
While norm (y)> tol2 & K <n
% Choose the column of X has the largest square of sum as T.
% Choose the column of Y has the largest square of sum as U.
[Dummy, tidx] = max (sum (X. * X ));
[Dummy, uidx] = max (sum (Y. * y ));
T1 = x (:, tidx );
U = Y (:, uidx );
T = zeros (RX, 1 );
% Iteration for outer modeling until convergence
While norm (t1-t)> Tol
W = x' * U;
W = W/norm (w );
T = T1;
T1 = x * W;
Q = y' * T1;
Q = Q/norm (Q );
U = y * q;
End
% Update P BASED ON T
T = T1;
P = x' * t/(t' * t );
Pnorm = norm (P );
P = P/pnorm;
T = T * pnorm;
W = W * pnorm;
% Regression and residuals
B = U' * t/(t' * t );
X = x-T * P ';
Y = Y-B * T * Q ';
% Save iteration results to outputs:
K = k + 1;
T (:, K) = T;
P (:, K) = P;
U (:, K) = u;
Q (:, K) = Q;
W (:, K) = W;
B (K, K) = B;
% Uncomment the following line if you want to see the Convergence
% Disp (norm (y ))
End
T (:, k + 1: End) = [];
P (:, k + 1: End) = [];
U (:, k + 1: End) = [];
Q (:, k + 1: End) = [];
W (:, k + 1: End) = [];
B = B (1: k, 1: K );
Example :----------------------------------------------------------
% Principal Component Analysis and partial least squares
% Principal Component Analysis (PCA) and partial least squares (PLS) are
% Widely used tools. This Code is to show their relationship through
% Nonlinear iterative partial least squares (nipals) algorithm.
% The eigenvalue and power method
% The nipals algorithm can be derived from the power method to solve
% Eigenvalue problem. Let X be the eigenvector of a square matrix,,
% Corresponding to the eignvalue s:
%
% $ AX = SX $
%
% Modifying both sides by a iteratively leads
%
% $ A ^ kx = s ^ kx $
%
% Now, consider another vectro y, which can be represented as a linear
% Combination of all eigenvectors:
%
% $ Y =/sum_ I ^ n B _ix_ I = XB $
%
% Where
%
% $ X =/left [X_1/,/cdots/, X_n/right] $
%
% And
%
% $ B =/left [B _1/,/cdots/, B _n/right] ^ t $
%
% Modifying y by a gives
%
% $ Ay = AXB = xsb $
%
% Where S is a diagnal matrix consisting all eigenvalues. Therefore,
% A large enough K,
%
% $ A ^ Ky = Xs ^ KB/approx/Alpha X_1 $
%
% That is the iteration will converge to the direction of X_1, which is
% Eigenvector corresponding to the eigenvalue with the maximum module.
% This leads to the following power method to solve the eigenvalue problem.
A = randn (10, 5 );
% Sysmetric matrix to ensure real eigenvalues
B = A' *;
% Find the column which has the maximum norm
[Dum, idx] = max (sum (A. * ));
X = a (:, idx );
% Storage to Judge Convergence
X0 = x-X;
% Convergence tolerant
Tol = 1e-6;
% Iteration if not converged
While norm (x-x0)> Tol
% Iteration to approach the eigenvector direction
Y = A' * X;
% Normalize the Vector
Y = y/norm (y );
% Save previous x
X0 = X;
% X is a product of EIGENVALUE AND EIGENVECTOR
X = A * Y;
End
% The largest Eigen Value corresponding eigenvector is Y
S = x' * X;
% Compare it with those obtained with EIG
[V, d] = EIG (B );
[D, idx] = max (DIAG (d ));
V = V (:, idx );
Disp (D-S)
% V and Y may be different in signs
Disp (min (norm (V-y), norm (V + y )))
% The nipals Algorithm for PCA
% The PCA is a dimension ction technique, which is based on
% Following decomposition:
%
% $ X = TP ^ t + e $
%
% Where X is the data matrix (m x N) to be analyzed, T is the so called
% Score matrix (m x a), p the loading matrix (N x A) and e the residual.
% For a given tolerance of residual, the number of principal components,,
% Can be much smaller than the orginal variable dimension, N.
% The above power algorithm can be extended to get T and P by iteratively
% Subtracting a (in this case, x) by X * y' (in this case, T * p') until
% Given tolerance satisfied. This is the so called nipals algorithm.
% The data matrix with Normalization
A = randn (10, 5 );
Meanx = mean ();
Stdx = STD ();
X = (a-meanx (ones (10, 1), :))./stdx (ones (10, 1 ),:);
B = x' * X;
% Allocate T and P
T = zeros (10, 5 );
P = zeros (5 );
% Tol for convergence
Tol = 1e-6;
% Tol for PC of 95 percent
Tol2 = (1-0.95) * 5*(10-1 );
For k = 1:5
% Find the column which has the maximum norm
[Dum, idx] = max (sum (X. * X ));
T = a (:, idx );
% Storage to Judge Convergence
T0 = t-t;
% Iteration if not converged
While norm (t-t0)> Tol
% Iteration to approach the eigenvector direction
P = x' * t;
% Normalize the Vector
P = P/norm (P );
% Save previous t
T0 = T;
% T is a product of EIGENVALUE AND EIGENVECTOR
T = x * P;
End
% Subtracing PC identified
X = x-T * P ';
T (:, K) = T;
P (:, K) = P;
If norm (x) <tol2
Break
End
End
T (:, K + 1:5) = [];
P (:, K + 1:5) = [];
S = diag (t' * t );
% Compare it with those obtained with EIG
[V, d] = EIG (B );
[D, idx] = sort (DIAG (D), 'scend ');
D = D (1: K );
V = V (:, idx (1: K ));
Fprintf ('the number of PCs: % I/N', k );
Fprintf ('norm of score difference between EIG and nipals: % G/N', norm (D-S ));
Fprintf ('norm of loading difference between EIG and nipals: % G/N', norm (ABS (V)-ABS (p )));
% The nipals Algorithm for pls
% For PLS, we will have two sets of data: the independent X and dependent
% Y. The nipals algorithm can be used to decomposes both X and Y so that
%
% $ X = TP ^ t + e,/, y = uq ^ t + F,/, u = Tb $
%
% The regression, u = TB is solved through least sequares whilst
% Decompsition may not include all components. That is why the approach is
% Called partial least squares. This algorithm is implemented in the PLS
% Function.
% Example: Discriminant PLS using the nipals Algorithm
% From Chiang, Y. Q., Zhuang, y.m and Yang, j.y, "optimal Fisher
% Discriminant analysis using the rank decomposition ", pattern recognition,
% 25 (1992), 101--111.
% Three classes data, each has 50 samples and 4 variables.
X1 = [5.1 3.5 1.4 0.2; 4.9 3.0 1.4 0.2; 4.7 3.2 1.3 0.2; 4.6 3.1 1.5 0.2 ;...
5.0 3.6 1.4 0.2; 5.4 3.9 1.7 0.4; 4.6 3.4 1.4 0.3; 5.0 3.4 1.5 0.2 ;...
4.4 2.9 1.4 0.2; 4.9 3.1 1.5 0.1; 5.4 3.7 1.5 0.2; 4.8 3.4 1.6 0.2 ;...
4.8 3.0 1.4 0.1; 4.3 3.0 1.1 0.1; 5.8 4.0 1.2 0.2; 5.7 4.4 1.5 0.4 ;...
5.4 3.9 1.3 0.4; 5.1 3.5 1.4 0.3; 5.7 3.8 1.7 0.3; 5.1 3.8 1.5 0.3 ;...
5.4 3.4 1.7 0.2; 5.1 3.7 1.5 0.4; 4.6 3.6 1.0 0.2; 5.1 3.3 1.7 0.5 ;...
4.8 3.4 1.9 0.2; 5.0 3.0 1.6 0.2; 5.0 3.4 1.6 0.4; 5.2 3.5 1.5 0.2 ;...
5.2 3.4 1.4 0.2; 4.7 3.2 1.6 0.2; 4.8 3.1 1.6 0.2; 5.4 3.4 1.5 0.4 ;...
5.2 4.1 1.5 0.1; 5.5 4.2 1.4 0.2; 4.9 3.1 1.5 0.2; 5.0 3.2 1.2 0.2 ;...
5.5 3.5 1.3 0.2; 4.9 3.6 1.4 0.1; 4.4 3.0 1.3 0.2; 5.1 3.4 1.5 0.2 ;...
5.0 3.5 1.3 0.3; 4.5 2.3 1.3 0.3; 4.4 3.2 1.3 0.2; 5.0 3.5 1.6 0.6 ;...
5.1 3.8 1.9 0.4; 4.8 3.0 1.4 0.3; 5.1 3.8 1.6 0.2; 4.6 3.2 1.4 0.2 ;...
5.3 3.7 1.5 0.2; 5.0 3.3 1.4 0.2];
X2 = [7.0 3.2 4.7 1.4; 6.4 3.2 4.5 1.5; 6.9 3.1 4.9 1.5; 5.5 2.3 4.0 1.3 ;...
6.5 2.8 4.6 1.5; 5.7 2.8 4.5 1.3; 6.3 3.3 4.7 1.6; 4.9 2.4 3.3 1.0 ;...
6.6 2.9 4.6 1.3; 5.2 2.7 3.9 1.4; 5.0 2.0 3.5 1.0; 5.9 3.0 4.2 1.5 ;...
6.0 2.2 4.0 1.0; 6.1 2.9 4.7 1.4; 5.6 2.9 3.9 1.3; 6.7 3.1 4.4 1.4 ;...
5.6 3.0 4.5 1.5; 5.8 2.7 4.1 1.0; 6.2 2.2 4.5 1.5; 5.6 2.5 3.9 1.1 ;...
5.9 3.2 4.8 1.8; 6.1 2.8 4.0 1.3; 6.3 2.5 4.9 1.5; 6.1 2.8 4.7 1.2 ;...
6.4 2.9 4.3 1.3; 6.6 3.0 4.4 1.4; 6.8 2.8 4.8 1.4; 6.7 3.0 5.0 1.7 ;...
6.0 2.9 4.5 1.5; 5.7 2.6 3.5 1.0; 5.5 2.4 3.8 1.1; 5.5 2.4 3.7 1.0 ;...
5.8 2.7 3.9 1.2; 6.0 2.7 5.1 1.6; 5.4 3.0 4.5 1.5; 6.0 3.4 4.5 1.6 ;...
6.7 3.1 4.7 1.5; 6.3 2.3 4.4 1.3; 5.6 3.0 4.1 1.3; 5.5 2.5 5.0 1.3 ;...
5.5 2.6 4.4 1.2; 6.1 3.0 4.6 1.4; 5.8 2.6 4.0 1.2; 5.0 2.3 3.3 1.0 ;...
5.6 2.7 4.2 1.3; 5.7 3.0 4.2 1.2; 5.7 2.9 4.2 1.3; 6.2 2.9 4.3 1.3 ;...
5.1 2.5 3.0 1.1; 5.7 2.8 4.1 1.3];
X3 = [6.3 3.3 6.0 2.5; 5.8 2.7 5.1 1.9; 7.1 3.0 5.9 2.1; 6.3 2.9 5.6 1.8 ;...
6.5 3.0 5.8 2.2; 7.6 3.0 6.6 2.1; 4.9 2.5 4.5 1.7; 7.3 2.9 6.3 1.8 ;...
6.7 2.5 5.8 1.8; 7.2 3.6 6.1 2.5; 6.5 3.2 5.1 2.0; 6.4 2.7 5.3 1.9 ;...
6.8 3.0 5.5 2.1; 5.7 2.5 5.0 2.0; 5.8 2.8 5.1 2.4; 6.4 3.2 5.3 2.3 ;...
6.5 3.0 5.5 1.8; 7.7 3.8 6.7 2.2; 7.7 2.6 6.9 2.3; 6.0 2.2 5.0 1.5 ;...
6.9 3.2 5.7 2.3; 5.6 2.8 4.9 2.0; 7.7 2.8 6.7 2.0; 6.3 2.7 4.9 1.8 ;...
6.7 3.3 5.7 2.1; 7.2 3.2 6.0 1.8; 6.2 2.8 4.8 1.8; 6.1 3.0 4.9 1.8 ;...
6.4 2.8 5.6 2.1; 7.2 3.0 5.8 1.6; 7.4 2.8 6.1 1.9; 7.9 3.8 6.4 2.0 ;...
6.4 2.8 5.6 2.2; 6.3 2.8 5.1 1.5; 6.1 2.6 5.6 1.4; 7.7 3.0 6.1 2.3 ;...
6.3 3.4 5.6 2.4; 6.4 3.1 5.5 1.8; 6.0 3.0 4.8 1.8; 6.9 3.1 5.4 2.1 ;...
6.7 3.1 5.6 2.4; 6.9 3.1 5.1 2.3; 5.8 2.7 5.1 1.9; 6.8 3.2 5.9 2.3 ;...
6.7 3.3 5.7 2.5; 6.7 3.0 5.2 2.3; 6.3 2.5 5.0 1.9; 6.5 3.0 5.2 2.0 ;...
6.2 3.4 5.4 2.3; 5.9 3.0 5.1 1.8];
% Split data set into training () and testing)
Idxtrain = 1:25;
Idxtest = 26:50;
% Combine training data with Normalization
X = [x1 (idxtrain, :); X2 (idxtrain, :); X3 (idxtrain, :)];
% Define class indicator as y
Y = Kron (eye (3), ones (25, 1 ));
% Normalization
Xmean = mean (X );
Xstd = STD (X );
Ymean = mean (y );
Ystd = STD (y );
X = (X-xmean (ones (75,1), :))./xstd (ones (75,1 ),:);
Y = (Y-ymean (ones (75,1), :))./ystd (ones (75,1 ),:);
% Tolerance for 90 percent score
Tol = (1-0.9) * 25*4;
% Perform pls
[T, P, U, Q, B] = PLS (X, Y, Tol );
% Results
Fprintf ('number of components retained: % I/N', size (B, 1 ))
% Predicted classes
X1 = (x1 (idxtest, :)-xmean (ones (25, 1), :))./xstd (ones (25, 1 ),:);
X2 = (X2 (idxtest, :)-xmean (ones (25, 1), :))./xstd (ones (25, 1 ),:);
X3 = (X3 (idxtest, :)-xmean (ones (25, 1), :))./xstd (ones (25, 1 ),:);
Y1 = x1 * (p * B * Q ');
Y2 = x2 * (p * B * Q ');
Y3 = X3 * (p * B * Q ');
Y1 = Y1. * ystd (ones (25, 1), :) + ymean (ones (25, 1 ),:);
Y2 = Y2. * ystd (ones (25, 1), :) + ymean (ones (25, 1 ),:);
Y3 = Y3. * ystd (ones (25, 1), :) + ymean (ones (25, 1 ),:);
% Class is determined from the cloumn which is most close to 1
[Dum, classid1] = min (ABS (Y1-1), [], 2 );
[Dum, classid2] = min (ABS (Y2-1), [], 2 );
[Dum, classid3] = min (ABS (Y3-1), [], 2 );
Bar (, classid1, 'B ');
Hold on
Bar (26:50, classid2, 'R ');
Bar (51: 75, classid3, 'G ');
Hold off
% The results show that most samples are classified correctly.