Error representation in principal component analysis (PCA)

Source: Internet
Author: User

Given n m -dimensional samples x (1), x(2),...,x(n), suppose our goal is to reduce these n samples from m -dimensional to k -dimensional, and as far as possible to ensure that the operation of this dimension does not incur significant costs (loss of important information). In other words, we want to project n sample points from m -dimensional space to K -dimensional space. For each sample point, we can use the following formula to represent this projection process:

Z=ATX (1)

where x is the M-dimensional sample point, Zis the K-dimensional sample point obtained after projection, A is a m* kThe matrix.

In retrospect, if we use principal component Analysis (PCA) for dimensionality reduction, we first find the mean value of the sample:

Then find the scatter matrix (scatter matrix):

Then we obtain the eigenvector s 1,s2,...,sK, and then the S 1, s 2,...,sk this k vector is the unit, even if | | s1| | =|| s2| | =...=| | s k| | =1, the last to get the matrix Ain formula (1):

(2)

Example : to more intuitively understand the meaning of the Geometry (1), we take a set of 2-dimensional data as an example, in this case, we use the PCA method to reduce this set of 2-dimensional arrays to 1-dimensional. These eigenvectors, which are stored in matrix A , actually have a new axis after dimensionality reduction, and in this case we get a new 1-dimensional axis. As shown in 1, the Red Fork in the figure represents a 2-dimensional sample point projecting vertically to the point on this new axis. For each sample point Xon a 2-dimensional space, as long as we put it in the formula (1) We can calculate its dimensionality reduction expression (in this case, is a 1-dimensional vector, that is, a value):

(3)

Fig. 1 Expression of 10 sample points in 2-dimensional space


The value calculated by formula (3) is actually the distance from the origin of these projection points. Therefore, we can draw a number line to represent this new axis, and then according to the formula (3) calculated these values, on the axis to mark their position, 2 is shown.

Figure 2 Expression of 10 sample points down to 1-dimensional space

The loss of this set of sample points after dimensionality reduction can be calculated by the following formula:

(4)

To understand the formula (4), we first need to understand the AATX(i). In retrospect, the calculation of a TX(i) is actually the representation of asample point in a low-dimensional space (see Figure 2). In relative terms,X(i) is the expression of sample points in high dimensional space. However, we know that points in different latitude spaces cannot be compared, for example, a point on a 2-dimensional space (x1, x2) cannot be compared to a point on a 1-dimensional space (y1) because they are not the same latitude ( They exist in a different world, they are not in the same world.

In order to compare the sample points of 2 different latitudes, we need to place them in the same latitude space. It is reasonable to project a point on a low-dimensional space into a high-dimensional space and assume a value of 0 at the latitude. AA The work of TX(i) is to reverse-dimension the sample points back to the high-dimensional space. In the example just cited,atx(i) is the fork point on Figure 2, and AAtx(i) is actually the fork point on the straight line (new axis) in Figure 1.

It is worth noting that the fork points in Figure 2 and Figure 1 are one by one corresponding, regardless of the high-dimensional space or in the low-dimensional space, their distance from the origin is constant (carefully observe the distance from the origin point of the fork in Figure 1 and Figure 2). We can still illustrate this in theory around this example, first of all assuming that one of the sample points XThe expression after descending dimension is Z=[ s1 x1 + s2 x2], then the reverse projection of it from the low dimension to the high dimension (in this case, from 1-D to 2-D) is:

(5)

Now, let's say that the Xapprox in the formula (5) is the point of the fork in Figure 1. To prove this, we need to prove two things: the distance from ①Xapprox to Origin is equal to the distance from Z to Origin, i.e. | | X| | =|| Z| | ②Xapprox in the hyper-plane of the high-dimensional space (in this case, the high-dimensional space is 2-dimensional space, the low-dimensional space is 1-dimensional space, the super-plane is a straight line).

prove :

As s is through the unit, that is | | S| | =s12+s22=1, so | | X Approax| | = (s1x1+s2x2) 2=| | Z| |, certificate of completion.

prove :

To get the general expression of the super-plane, and to get the general expression of the super-plane, we must calculate the plane ncorresponding to the ultra-plane, in this case, the FA plane satisfies nTs=0, where st=[s 1, S2]. We can get n=[-s2/s1, 1], then the general expression of the super-plane is (-s2/s1)x-1+x 2=0. Put Xapproxt=[s1 (s1x1+s2X2), s 2 (s1x1+s2x2)] substituting (-s2/s1)x-1+ x2, get (-s2/s1) *s1 (s1x1+s) 2x2) +s2 (s1x1+s2x2) = -s 2 (s1x1+s2x2) + s2 (s1x1+s 2x2) = 0, said for any xapprox, are on the super-plane, proof.

In return (4),L calculates the sum of the distances of each sample point when projected into a low-dimensional space in a high-dimensional space.

Error representation in principal component analysis (PCA)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.