Abstract:
PCA (principal component analysis) is a multivariate statistical method. PCA uses linear transformation to select a small number of important variables. It can often effectively obtain the most important elements and structures from overly "rich" data information, remove Data Noise and redundancy, and reduce the original complex data dimension, reveals the simple structure hidden behind complex data. In recent years, PCA has been widely used in computer fields, such as data dimensionality reduction, image lossy compression, and feature tracking.
Introduction:
In the physiological characteristics of the human brain, the human brain has diverse means of cognition to the outside world, leading to a high information dimension. If the human brain does not perform dimensionality reduction on the acquired information, the efficiency and accuracy of information processing by the human brain will decrease. Therefore, when the human brain processes these sensory nerves, all of them have passed complex dimensionality reduction processing.
The PCA method is widely used in data analysis from Neurology to computer graphics. Because it is a simple non-parameter method, it can extract data-related information from complicated data sets. The motivation for principal component analysis is to calculate the most important basis for a data space containing noise to express the data space. However, these new bases are often hidden in complex data structures. We need to filter out noise to find a new basis for restructuring the data space.
The PCA method is a common method. One of its advantages is that it can reduce the dimension of data. We use the PCA method to find the principal component of the dataset and select the most important part, the remaining dimensions are omitted to achieve Dimensionality Reduction and simplified models. Data is indirectly compressed and the original data information is largely preserved, just as the human brain performs Dimensionality Reduction During neural processing.
Therefore, PCA is widely used in machine learning, pattern recognition, and computer vision.
In face recognition, assume that the training set is 30 different n × n face images. When each pixel in an image is regarded as one-dimensional information, an image is a N2 vector. Because the structure of the face has a great similarity, if it is the same person's face, the similarity is greater. What we want is to be able to express a face through a face, rather than using pixels to express a face. Then we can use the PCA method to process 30 training sets of images and look for similar dimensions in these images. After extracting the most important principal component, we can compare the similarity between the recognized image and the original image after the changed principal component to measure the similarity between the two images.
In terms of image compression, we can also use the PCA method for image compression, also known as Hotelling or karhunen and leove transformation. We use PCA to extract the main component of the image, remove some sub-components, and then transform back to the original image space. The image is compressed to a large extent due to the reduction of the dimension, at the same time, the image retains important information of the original image to a large extent.
Body:
The PCA method maps the data space to a low-dimensional space through orthogonal transformation. The base vector groups must satisfy the requirements of orthogonal conditions, and the position sub-spaces composed of base vector groups best consider the correlation of data. After the original dataset is transformed into a space, the correlations of a single data sample should be reduced to the lowest point.
Figure 1 red points represent original data points; green points represent points mapped to low-dimensional space; purple lines represent ing planes.
Variance Maximization
As we have mentioned above, the process of PCA is actually the process of searching for low-dimensional subspaces. So what kind of low-dimensional space meets our requirements. Because we want to reduce the correlation between the mapped data to the lowest point, we can find the low-dimensional space by solving the optimal strategy of maximizing the variance after the ing.
Suppose we have n sample data {xn}, and each sample data is D-dimension. We want the sample data to be mapped to the m <D-dimension sub-space and maximize the variance of the mapped data. To simplify the problem, we make m = 1, that is, ing to a 1-dimensional space. We set the direction vector of the low-dimensional space to the D-dimensional unit vector u1, and it has the orthogonal, that is, u1tu1 = 1. Then, every sample data point XN is mapped to a one-dimensional space, which indicates u1txn. We set the mean vector of the original n sample data
(1)
Then the data variance after the ing is:
(2)
Note: Here s is the covariance matrix of the original dataset.
(3)
The expected low-dimensional space is the space that maximizes the equation (2) value, that is, the variance maximization. The problem is converted to the maximum value of equation (2.
Because the U1 vector is an orthogonal vector, we introduce the Laplace Multiplier Method to Solve the maximum value of equation (2. Constructor condition restriction equations:
(4)
From the knowledge of advanced mathematics, we can know that to solve the maximum value of the U1 equation (4), we only need to make (4) Evaluate the U1 to make it equal to 0, and get:
(5)
From the knowledge of linear algebra, we can see that it must be the feature value of the covariance matrix, and U1 is its feature vector.
We will extend the range from 1 dimension to m> 1 dimension. The covariance matrix s should have m feature values: the corresponding feature vectors should be: u1 ,..., Un.
Minimize errors
Another construction form of PCA is based on error minimization.
We introduce D-dimension Complete Orthogonal base vector group, that is
(6)
Therefore, we can use a complete orthogonal basis vector to linearly represent every data in the sample dataset XN,
(7)
Take full advantage of the orthogonal attributes of equation (6), and use equation (7) to obtain coefficients and inverse return equations (7) to obtain equations:
(8)
We can see that expression equation (8) requires D-dimensional information, and our goal is to use M <D-dimensional information to express Xn in an approximate way:
(9)
It represents the Special components of data points, while Bi represents the components shared by all data points. We construct a target function:
(10)
The general meaning is that we want to use the data points expressed in M dimension to approach the D-dimension sample data points. Here we use the Euclidean distance to measure the similarity between the two data points. Then our problem is transformed into minimizing the target function J. By performing the derivation, we can conclude that:
(11)
(12)
Inverse return equation (10),:
(13)
Therefore, we only need to find the minimum feature value of the D-M of the covariance matrix S.
SVD Singular Value Decomposition
In the PCA method, the covariance matrix is decomposed and the principal component is extracted using two methods:
1. feature value decomposition. This method has some limitations, and the matrix to be decomposed must be a square matrix.
2. SVD Singular Value Decomposition.
Singular Value Decomposition is an important matrix decomposition method in linear algebra. It is used in signal processing, statistics, and other fields. Singular Value Decomposition can be used to divide a complex matrix into several smaller and simpler submatrices for multiplication. These submatrices describe important features of the original matrix.
For a m × n matrix A, it can always be divided:
(14)
U and V are the feature vectors of AAT and Ata respectively, while they are their feature roots. In the PCA method, we select P largest feature root and its corresponding feature vectors to approximate:
(15)
The linear algebra theory proves that a and A' are near by the least square method. The closer P is to N, the closer the approximation result is to the original matrix. Therefore, when the selected P is much less than N, the smaller the amount of information to be stored, achieving the goal of dimensionality reduction and compression.
Application of PCA image compression:
Procedure:
1. Divide the input image into 4x4 16-dimensional blocks.
2. Construct a matrix of observed sample data. Each column is a block and each column is a 16-dimensional data matrix.
3. Calculate the sample covariance matrix.
4. The sample covariance matrix is used for feature root and feature vector decomposition.
5. Select the largest feature root and reconstruct the original image.