--PCA of non-supervised dimensionality reduction algorithm

Source: Internet
Author: User

PCA is an unsupervised learning algorithm, which can effectively reduce the latitude of data in the case of preserving most useful information.

It is mainly used in the following three areas:

1. Increase the speed of the algorithm

2. Compress the data to reduce the consumption of memory and hard disk space

3. Visualize data to map high latitude data to 2-D or 3-D

All in all, the PCA thing is to complete a mapping that transforms the original n-dimensional data into k-dimensional. Among them, k<n

Its core algorithm is as follows:

1. Homogenization of data

X ' = [X-mean (x)]/range (x)

2. Calculate its covariance matrix

That is: Sigma = 1/m * x ' * x

3. Perform SVD decomposition and compute eigenvectors

[U, S, V] = SVD (Sigma)

You can get the mapping formula by selecting the first k column in U

That

Ureduce = U (:, 1:k);

z = Ureduce ' *x;

Z is the characteristic matrix after dimensionality reduction.

As for how to choose K, it depends on how much change range (variance) We decide to keep the original information. When we want to keep

Variance when the original information 99%:

That is: the first k diagonal elements in S are added, the smallest can make the sum and greater than the entire s diagonal and 99% K is the K we should choose.

If there is compression, then there is a corresponding reduction of the natural. However, the compression of the PCA itself is lossy and cannot be restored to exactly the same value as the original. (Of course K=n)

We can only get the approximate reduction value of the primitive eigenvector (non-matrix). The formula is:

That is: Xapprox = ureduce * Z

There are several areas to note when using PCA:

1. When building machine learning algorithms, do not want to use the PCA on the first, generally speaking, the direct use of the original features will be better.

PCA is only necessary when the original algorithm is too slow, or the memory, hard disk space is not large enough to support the calculation.

2. Do not use PCA to reduce the problem of overfitting, the use of regularization is to solve the more reasonable approach to fit. Because

PCA only looks at the feature matrix to determine how to reduce the number of features, while the regularization looks at both the feature matrix and the corresponding label to reduce overfitting.

--PCA of non-supervised dimensionality reduction algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.