Netease Open Course: 14th courses
Notes, 10
In the factor analysis mentioned earlier, the EM algorithm is used to find potential factor variables for dimensionality reduction.
This article introduces another dimension reduction method, principal components analysis (PCA), which is more direct than factor analysis and easier to calculate.
Principal component analysis is based on,
In reality, for high-dimensional data, many dimensions are Disturbance noise, or some dimensions are redundant, which has no effect on describing data features.
For example, when describing the speed of a car, we use different units of mph or kph as two dimensions. In fact, we only need one dimension.
If most of the data in a high-dimensional data, such as a three-dimensional space, is concentrated in a two-dimensional plane, then we use the two main vectors of the Two-dimensional plane to replace the three-dimensional vector, to achieve the purpose of dimensionality reduction
In addition, the original variable information is retained as much as possible.
Broadly speaking, for n-dimensional spaces, data points are concentrated in a K-dimensional hyperplane, so we can say that K main vectors of this hyperplane are the main component.
Look at Ng's example of helicopter self-driving to describe the helicopter driver's level.
X1 represents driving skills; x2 represents driving hobbies and interests. These two dimensions are extremely relevant, such
We can see that all vertices are concentrated near the axis of u1, so we can use U1 as the principal component to replace the original X1 and X2
In fact, we can see that U1 and U2 are the results of rotating X1 and X2. After rotation, we find that the dataset is in the U1 dimension, without U2.
For an n-dimensional space, after rotation, we find that the K-dimensional data can be well described. Then, the K-dimension is the principal component, and the coordinate axes are orthogonal, that is, the features are independent.
Therefore, after rotation, the selected principal components are also independent.
Knowing the principle of principal component analysis, the following question is how to find the principal component?
First, perform preprocessing,
1. Calculate the Mean Value
2. Calculate the deviation from the mean value.
3. Average algorithm variance
4. Normalization deviation. Because the scale on each dimension is different, for example, one dimension is weight 80 and one dimension is height 1.8, so normalization is required.
Okay. How do I find u?
One way to pose this problem is as finding the unit vector u so that when the data is projected onto the direction corresponding to U, the variance of the projected data is maximized.
Find a unit vector to maximize the variance of the point where data is projected to U, that is, the most scattered
Why?
First, we aim to find the child superplane so that the data points are concentrated on this superplane as much as possible, that is, the distance from the point to this superplane is as small as possible.
For example, the variance is the largest when the distance from the point to the U vector is the least hour.
When the direction of the right graph is selected, the variance is the smallest.
In addition, the variance is large and the points are scattered to facilitate differentiation.
Formal expression,
Principal Component Analysis (principal components analysis)-maximum variance Interpretation
To formalize this, note that given a unit vector u and a point X, the length of the projection of X onto U is given
So the variance sum of all vertices is,
The middle part is the covariance matrix of X,
Set,
Is
Is
The formula above is,
The two sides multiply by u at the same time
That is, this is the formula of feature vectors and feature values.
The above goal is to find u in the case of maximization. Here we convert it to the covariance matrix of X, and the feature vector u with the largest feature value.
Here we will briefly explain the feature vectors and feature values.
Http://zh.wikipedia.org/wiki/%E7%89%B9%E5%BE%B5%E5%90%91%E9%87%8F
The matrix can be regarded as a linear transformation, so the above formula can be seen as linear transformation of the vector u, the resulting vector is still in the same direction, only scaling (that is, number multiplication transformation)
In this way, U is called a linear transformation or a matrix feature vector, and is the feature value corresponding to this feature vector.
We can see that the solution to PCA is actually very simple,
You only need to first calculate the covariance matrix between all X and then find the feature vector of this matrix.
Finally, sort by feature values, and use the first K feature vectors as the new principal component vectors.
PCA is widely used,
Compressed Data
Visualization, high-dimensional data cannot be visualized, and reduced to two or three dimensions for visualization
Reduces over-fitting and supervised learning with high-dimensional data. The model is highly complex and easy to overfit. The PCA dimensionality reduction function prevents overfitting.
Noise Removal. For example, for face recognition, the pixel of 100x100 is a 10000 feature. Principal Component features can be found through PCA dimensionality reduction.
Exception detection: the PCA can be used to locate the hyperplane composed of K principal components. If the new data is far away from the hyperplane, it may be abnormal data.