Algorithmic Essays-SVD,PCA and KPCA

Source: Internet
Author: User



Svd


Defined


Assuming \ (a\) is the \ (m\times n\) matrix, there is a \ (M\times m\) Korimasa cross matrix \ (u=[u_1,u_2,\cdots,u_m]\), \ (N\times n\) Korimasa cross matrix \ (V=[v_1,v_2,\cdots,v_n ]\) and \ (M\times n\) diagonal matrix \ (\sigma=diag{\sigma_1,\sigma_2,\cdots,\sigma_p}\), which makes \ (A=u^t\sigma v\), this matrix decomposition form becomes \ (a\) Singular value decomposition (SVD), and \ (\sigma_i\) is called the singular value of \ (a\).


Explain


The Matrix \ (a\) of \ (M\times n\) can be seen as a vector of the \ (m\times n\) dimensional linear space. This way we can always choose \ (M\times n\) a base vector to expand the matrix. The simplest set of bases can be selected as \ (e (i,j) \), which is a matrix that has only a value of 1 for the column element of section \ (i\), and 0 for the remaining elements. In this way, the expansion coefficient of the arbitrary matrix is exactly the corresponding matrix element.


Theoretically, any set of complete base vectors can be used to expand all the vectors in the linear space, but the complexity of the expansion coefficients may vary greatly for different situations. So we generally want to choose different base vectors according to different situations, such as rectangular coordinates, spherical coordinates and column coordinate in three-dimensional space. The SVD decomposition provides us with a method of selecting a suitable base vector based on a given matrix \ (A\), and also gives the expansion coefficients of matrix \ (a\) under this set of base vectors.


In addition, we know that \ (m\times n\) dimension space can be obtained by the tensor product of the \ (m\) dimension space and the \ (n\) dimension, and the corresponding base vectors are obtained by tensor product of the base vectors of the \ (m\) dimension space and the \ (n\) dimension space. Because the Matrix \ (u\) and \ (v\) are orthogonal, their column vectors form the base vector of the \ (m\) and \ (n\) dimension space. therefore \ (U_i v_j^t\) (note subscript is not the same) constitutes a set of m\times (n\) dimensional space of the base vector, and \ (\sigma\) is the group base vector expansion \ (a\) obtained coefficients, can be considered as the weight of the base vector.

Visible at this time \ (\sigma\) a maximum of \ (p\) a non-0 value, the corresponding base vector is \ (u_i v_i^t\). It is much simpler than using \ (e (i,j) \) deployment. According to the matrix multiplication is easy to know, \ (a=\sum_{i=1}^{r}\sigma_i u_i v_i^t\), given the \ (u\) and \ (v\), \ (a\) is completely determined by the singular value \ (\sigma_i\), So we can choose to discard some small singular values to get the effect of compressing the data.

Application

The most important application of SVD is to compress the data, preserving only the most important data. From \ (a=\sum_{i=1}^{r}\sigma_i u_i v_i^t\), it is known that only the storage matrix \ (u\) and \ (v\) of each \ (r\) column vector and \ (r\) singular value can be fully recovered from \ (a\). If R\ is too large, you can keep only the largest pre \ (p\) singular values and their corresponding \ (u\) and \ (v\) column vectors, because the size of the singular value \ (\sigma_i\) represents the weight of the Chunki vector \ (u_iv_i^{t}\).

In addition, from the equation \ (a=u^t\sigma v\) You can export two matrices \ (a_1=av^t=u^t\sigma=\sum_{i=1}^{r}\sigma_i u_i\) and \ (A_2=ua=\sigma v=\sum_{i=1}^{r} \sigma_i v_i^t\). The two matrices can be thought of as the \ (M\times r\) and \ (R\times n\) matrices of the original matrix \ (a\) for column/row compression. This method can be used instead of the PCA mentioned below to process the data, for example, the PCA algorithm in Scikit-learn is based on SVD.


Pca


known \ (m\times n\) dimensional Data matrix \ (x=[x_1,x_2,\cdots,x_n]\), where \ (x_i\) is a column vector of length \ (m\). We want to compress the data in the case of preserving the primary information by looking for a linear transformation \ (v\), so that the y=vx\, the transformed data \ (y\) should be able to clearly characterize the information contained in each data, making it possible to delete certain data. We can use the PCA method to find \ (v\), and SVD, the idea of PCA is to calculate the covariance matrix \ (x\), and then diagonally, looking for the covariance matrix of the spindle direction (eigenvector), the spindle length (that is, eigenvalue) large direction of the difference is also large, so that the corresponding data information is also very large. For smaller spindle lengths, they can be ignored without too much impact on the data. This is done as follows:

assumes that \ (x\) is already de-centralized, then the \ (x\) covariance matrix \ (c=xx^t\) or \ (c=x^tx\). You can see that the different representations of \ (c\) actually implement data compression for different dimensions of \ (x\) (that is, compressing the number of data points and features). We take the former as an example, then \ (c\) is the matrix of \ (M\times m\).

Next, a similar diagonalization is performed on \ (c\) to find its \ (m\) eigenvalues \ (\lambda_1 \sim \lambda_m\) and its corresponding eigenvectors \ (v_1 \sim v_m\), This assumes that the eigenvalues and eigenvectors have been arranged in order from large to small. We make \ (v=[v_1,v_2,\cdots,v_m]\), calculate \ (y=v^tx=[y_1,y_2,\cdots,y_n]\), where \ (Y_i=[v_1^tx_i,v_2^tx_i,\cdots, v_m^Tx_i]\). \ (y\) is a new representation of the data we want to find. Y\ () satisfies the following properties:

    • \ (y_i\) is unrelated, i.e. \ (yy^t\) is a diagonal array (\ (yy^t= (V^TX) (V^TX) ^t=v^t (xx^t) v\), which is just a similar diagonalization matrix of \ (c\)
    • If \ (i>j\) \ (Var (y_i) >var (Y_j) \) (\ (var (y_i) \) is the eigenvalue, which has been arranged from large to small)



\ (y\) calculation process can be considered as the original data in the direction of each "spindle" projection, and the length of the spindle is used to measure the weight of each spindle. In this way, those indicators with smaller weights can be ignored. The specific operation is to delete in \ (y\) corresponding to the smaller eigenvalues of those eigenvectors, such as only reserved \ (r\) eigenvector, then the last obtained \ (y_i\) length is only \ (r\), The Matrix \ (y\) is \ (r*n\), compared to the previous \ (x\), The compression above the first dimension is implemented. Also, notice that the covariance matrix is a symmetric array, so \ (v\) is an orthogonal matrix, and you can directly calculate \ (x=v^ty=v^tvx\) to restore the original data. When \ (v\) has removed a number of eigenvectors, the recovered data becomes simpler as well.

As mentioned earlier, if you need to compress another dimension of \ (x\), you just need to compute the covariance matrix \ (x^tx\).


The difference between SVD and PCA


SVD directly decomposes the data without the need for data to be square. The PCA method needs to find the covariance matrix of the data first (it is a square matrix) and then decompose the eigenvalue. In addition, the SVD can directly realize the compression of the row/column, respectively, corresponding to the data characteristics and data volume compression. The covariance matrix \ (xx^t\) and \ (x^tx\) need to be obtained for the PCA to achieve the compression of row/column.


KPCA


KPCA is a method developed to solve the problem that the primitive data structure is more complicated, thus the linear is not divided. Like SVM, it also uses kernel functions to map known low-dimensional data to high-dimensional, and then to do PCA in high-dimensional. The computational process of the whole high dimensional space involves the inner product operation, and the kernel function can be used to calculate the original space.

To compress the data \ (x\) with the above, we select a kernel function \ (z:r^m\times r^m \mapsto r\) to make \ (Z (X_i,x_j) =<\phi (x_i), \phi (X_j) >\) to all \ (X_i,j \in r^m\ ) are established. where \ (\phi\) is defined as \ (\phi:r^m \mapsto f\), the element \ (r^m\) in \ (x\) is mapped to the element in the Hilbert space \ (<,>\) (possibly infinite dimension) that defines the inner product \ (f\) \ (\phi (x) \). \ (f\) is also called the feature space. The next step is to prove that the data can be PCA, regardless of the f\ dimension.

Suppose or intend to compress the first dimension of the data, that is, to reduce the data to \ (R\times n\). According to the PCA method, the covariance matrix in data \ (x\) in \ (f\) space can be expressed as:


\ (\bar{c}=\phi (x) \phi (x) ^t=\sum_{j=1}^{n}\phi (X_j) \phi (x_j) ^t\)


Note that the Matrix \ (\bar{c}\) is no longer a \ (M\times m\), and it may be infinite. Therefore, the direct diagonalization does not work, the need to seek alternative methods.

\ (\bar{c}\) the characteristic vector \ (v\) satisfies the equation:


\ (\bar{c}v=\lambda v\)


Bring \ (\bar{c}=\sum_{j=1}^{n}\phi (X_j) \phi (X_j) ^t\) into the above:


\ (V=\frac{1}{\lambda}\sum_{j=1}^{n}\phi (X_j) <\phi (X_j), v>\)


The visible eigenvector \ (v\) belongs to the subspace of the vector \ (\phi (x_j) \) spanned, so there are at most a \ (n\) linearly independent eigenvector.

Consider the equivalent equation:


\ (\lambda <\phi (X_j), V>=<\phi (X_j), \bar{c}v>\)


Then the \ (v\) is represented as the expanded form of \ (\phi (X_j) \):


\ (V=\sum_{j=1}^{n}\alpha_j\phi (X_j) \)


The equivalent eigenvalue equation can be obtained by substituting the above formula:


\ (\lambda k\alpha=k^2\alpha\)


where \ (k\) is \ (N\times n\) matrix, satisfies \ (K_{ij}=<\phi (x_i), \phi (X_j) >\), \ (\alpha=[\alpha_1,\alpha_2,\cdots,\alpha_n]\)

Therefore, we can find the \ (\lambda\) and \ (\alpha\) by solving the eigenvalue equation \ (\lambda\alpha=k\alpha\). Obviously satisfying \ (\lambda\alpha=k\alpha\) solution is bound to satisfy \ (\lambda k\alpha=k^2\alpha\), and can prove that the latter's additional solution does not for the expanded \ (v=\sum_{j=1}^{n}\alpha_j\ Phi (X_j) \) has an impact. In addition, the coefficient \ (\alpha_j\) in the expansion is just the projection of the original data in the high dimensional space \ (f\) to the spindle direction, that is, the compressed data. In this way, we transform the original solution \ (\bar{c}\) eigenvector \ (v\) into the eigenvector \ (\alpha\) of the equivalent (k\), and find out the compressed data of \ (\alpha\). But what needs to be clarified is that the eigenvalues of \ (k\) are not spindles, the spindle is still \ (v\), and \ (K\) is only meant to help us calculate the data on \ (f\) space (\phi (x) \) on \ (v\) projection.

By order \ (<v,v>=1\), we can normalized the corresponding eigenvector \ (\alpha\):


\ (1=<v,v>=\sum_{i,j=1}^{n}\alpha_i\alpha_j<\phi (x_i), \phi (X_j) >=<\alpha,K\alpha>=\lambda< \alpha,\alpha>\)


For any new data point \ (\phi (x ') \), its projection on the spindle \ (v\) is:


\ (<v,\phi (x ') >=\sum_{j=1}^n\alpha_j<\phi (X_j), \phi (x ') >\)


It can be seen that PCA decomposition in \ (f\) space only involves calculating the inner product \ (<\phi (x_i), \phi (X_j) >\), and this may be computed in the original space with the kernel function, so KPCA is feasible.

Finally, given that in the above deduction, we assume that the data is already centered in the \ (f\) space, it is necessary to de-center the \ (\phi (X) \) for any data. However, because the specific form of \ (\phi (X) \) is unknown, it is difficult to actually execute it. The workaround is to k\ the \ (*):

The data after the \phi^c is \ (x) =\phi (x)-\frac{1}{n}\sum_{i=1}^n\phi (x_i), which computes the matrix to be centered (K^c_{ij}=<\phi^c (x_i), \phi^c (x _j) >\)

Substituting the expression of \ (\phi^c (x) \) and taking advantage of the linear nature of the inner product, you can finally get:


\ (k^c=k-1_nk-k1_n+1_nk1_n\)


where \ (1_n\) is the \ (N\times n\) matrix in which all elements are \ (1/n\).

Personally feel that the two key point of the KPCA method is to represent the covariance matrix's eigenvector \ (v\) as the linear superposition of the data \ (\phi (x) \), one is that the compressed data in PCA is not from the feature vector \ (v\), but from the data to \ (v\), i.e. \ (<v , \phi (x) >\). Unlike PCA, KPCA, although the principle is to use the covariance matrix of the spindle to filter the data, but the specific calculation did not decompose it, but the use of the above two points to calculate the matrix \ (k\). Note that the covariance matrix of the original space and the dimension of \ (K\) are different, one is \ (M\times m\) and the other is \ (N\times n\). When mapping to a high-dimensional space, the covariance matrix may become an infinite dimension, and data compression is actually performed on the corresponding dimension of \ (K\). For example, there are 100 sets of two-dimensional data, which want to compress to 1 dimensions, the PCA is the eigenvalues of the covariance matrix that computes \ (2\times 2\), while KPCA is the eigenvalues of the \ (100\) matrix of the calculation \ (100\times k\), but 99 eigenvectors are discarded, Only 1 longest spindles are retained.



Algorithmic Essays-SVD,PCA and KPCA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.