1. Vectors and their base transformations
1.1 Vector inner product
(1) The inner product of a vector of two dimensions is defined as follows: The inner product operation maps two vectors to a real number.
(2) geometrical meaning of inner product
Assuming that a\b is a two n-dimensional vector, the n-dimensional vector can be equivalent to a directed segment emitted from the origin in an n-dimensional space, for convenience of understanding, where a and B are two-dimensional vectors. A= (x1,y1), b= (X2,y2), A/b can be represented in a two-dimensional plane with a directed segment from the origin, Such as:
In, draw a vertical line from point A to B. The intersection of Perpendicular and b is called a projection on B. The angle of A and B is a, the vector length of the projection is
* The vector length may be negative, and its absolute value is the segment length. The symbol depends on its orientation in the same direction as the standard direction or vice versa. The scalar length is always greater than or equal to 0, and the value is the length of the segment.
An additional representation of the inner product:
The inner product of A and B is equal to the projection length of a to B multiplied by the modulus of B. If the modulus of vector b is 1, then the inner product of A and B is equal to the vector length of the straight line projection of a to B.
1.2 Base
(1) A two-dimensional vector corresponding to a two-dimensional Cartesian coordinate system from the origin of a directed line segment, algebraic aspect, often using the point coordinates of the end of the segment to represent vectors, for example (3,2), but a (3,2) does not accurately represent a vector. The analysis can be made: "3" actually represents the vector on the x-axis projection value is 3, the projection value on the y-axis is 2, that is, implicitly introduced a definition: the x-axis and y-axis positive direction length of 1 vector as the standard. More specifically, the vector (x, y) can actually represent a linear combination:. Here (1,0)/(0,1) is a set of bases in a two-dimensional space.
Conclusion: In order to describe the vector accurately, a group of bases is first determined, and then the projection values on the various lines where the base is located are given.
(2) Any two-dimensional vector that is linearly unrelated can be a group of bases. The general hypothesis is that the modulus of the base is 1, for example a set of bases, then the coordinates of the point (3,2) on the new base are, from which the image can be observed:
(3) matrix representation of the base transformation
The example in the second section to the matrix is represented as:
The two rows of the matrix are two bases. Times the original vector is exactly the new coordinate; If there are multiple two-dimensional vectors to be converted to the coordinates of the new base, these two-dimensional vectors are sorted into a matrix by column, e.g. Point (2,2)/(3,3)
* Conclusion: If there are m n-dimensional vectors, want to transform it into a new space represented by the R n-dimensional space, first the R-base is formed by the row of the matrix A, and then the vector is a column matrix B, then the two matrices of the product AB is the transformation result, where AB's M column B is the result of the transformation R determines the dimension of the transformed data. That is, you can transform an n-dimensional data into a lower dimensional space, and the transformed dimension depends on the number of bases.
The significance of multiplying the two matrices is to convert each column vector in the right matrix to the space represented by the vector for each row of rows on the left.
2. Covariance matrix
2.1 Basic Preparation
It is discussed earlier that different bases can give different representations to the same set of data, and if the number of bases is less than the dimensions of the vector itself, the target of dimensionality reduction can be achieved. How to choose K-dimensional vector (K-base) to maximize the retention of the original information?
The following examples illustrate:
The 5 records of the data are composed to represent them in matrix form:
Each column is a data record, one row for a field, in order to handle the convenience, first of all the values in each field minus the field mean, the result is to change the mean value of each field to 0, the result of the matrix transformation is:
Its corresponding position in the coordinate system is:
Question: Now you want to use a Bellavita to represent the data, and want to keep the original information as much as possible, how to choose the base?
The answer: you want the projected values to be scattered as far as possible.
The above figure points the data point to the first quadrant and the third quadrant of the slash projection, then the above 5 points can be distinguished after projection.
2.2. Variance
It is hoped that the data values obtained after projection are scattered as far as possible, and the degree of dispersion can be expressed by mathematical variance.
The variance of a field can be thought of as the mean of the sum of squares of each element and the mean value of the field, i.e.
Because the above has averaged each field of data, the variance can be directly divided by the sum of the squares and the number of elements of each element, i.e.
So the above problem is formally expressed as: to find a base, so that all the data transformed into the coordinates of the base, the variance is the largest.
2.3 Covariance
For the problem of turning two dimensions into one dimension, we can find the direction that makes the variance maximum. But for the higher dimension, need to consider more, such as three-dimensional to two-dimensional, and the same as before, first hope to find a direction so that the projection behind the largest, so that the choice of a direction to complete, continue we need to choose a second projection direction.
If you continue to choose the direction of the most variance, then this direction and the first direction should be almost coincident, obviously this is not possible. Intuitively, in order for these two fields to represent more primitive information as much as possible, it is desirable that they be linearly unrelated, because the correlation means that the two fields are not completely independent and there is a duplicate representation of the information.
Mathematically, the covariance of two fields is used to denote their relevance, and since the mean value for each field has been set to 0, the covariance formula is calculated as follows:
It can be concluded that, with a field average of 0, the covariance of two fields is represented by a concise representation of the inner product divided by the number of elements.
A covariance of 0 o'clock indicates that two fields are completely independent. In order to have a covariance of 0, select the second base to select only in the direction of the first Kizheng intersection. So the final choice of two directions must be orthogonal.
Reduced dimension optimization Objective: To reduce a set of n-dimensional vectors to K-dimensional, the goal is to select K units (modulo 1) orthogonal basis, so that the original data into this set of base, the 22 covariance is 0, and the field of the variance as large as possible (under the orthogonal constraints, take the largest k variance)
2.4 Covariance matrix
The final goal is closely related to the variance in the field and the covariance between fields, and what you want to do now is to make them uniform.
Suppose we have a and B two fields, M vectors, which make up the Matrix X by line:
The x is then multiplied by the X and divided by the number of vectors:
It can be found that the two elements on the diagonal of the matrix are the variance of that field, and the other elements are the covariance of A and B. That is, the variance and covariance are unified into a matrix.
Generalized conclusion: If there are m n-dimensional data, and the column is shot into the matrix X of N*m, then C is a symmetric matrix, the diagonal is the corresponding variance of each field, the row J column I is the same as the element J row I column, which represents the covariance of the I and J two fields.
2.5 diagonalization of covariance matrices
To achieve the goal of making the variance in the field as large as possible, the covariance between the fields is 0, and the covariance matrix is diagonally diagonal: that is, all but the diagonal elements are 0, and the elements on the diagonal are arranged in order from large to small, thus achieving the optimization goal.
The covariance matrix corresponding to the original data matrix X is C,p is a set of base-by-row matrices, y=px, Y is the data of the X-P-base transformation. The covariance matrix of y is D, and the relationship between D and C is deduced:
Obviously, the covariance matrix D of the transformed Matrix y should be 0 except for the diagonal elements. The p we are looking for is the p that can make the original covariance matrix diagonal.
That is, the optimization target becomes: Looking for a matrix p, satisfies is a diagonal matrix, and diagonal elements in order from large to small, then the first k line of P is to find the base, with P's former K-line matrix multiplied by C so that x from n-dimensional to K-dimensional and meet the above optimization conditions.
2.6 Covariance Diagonalization
The covariance matrix C is a symmetric matrix, and the real symmetric matrix has good properties:
3. Dimensionality reduction method
3.1 Dimensionality reduction purposes:
- Make data sets easier to use
- Reduce the computational overhead of many algorithms
- Noise removal
- Make the results understandable
3.2 Three methods of dimensionality reduction
(1) Principal component analysis Method (principal component, PCA)
In PCA, the data is converted from the original coordinate system to the new coordinate system, and the selection of the new coordinate system is determined by the data itself. The first new axis selects the direction in which the original data is the most bad, and the second new axis is selected as the direction of the first new axis orthogonal and with the largest variance. The process repeats, repeating the number of features in the original data.
(2) Factor Analysis (factor)
In factor analysis, it is assumed that there are some hidden variables that are not observed in the observation data generation. Suppose that the observed data is a linear combination of these implicit variables and certain noises.
(3) Independent component analysis (independent component Analysis,ica)
In Ica, the assumption is that the data is generated from n data sources. That is, the data is a mixed observation of multiple data sources, which are statistically independent of each other, and only assume that the data is irrelevant in PCA. As with factor analysis, if the number of data sources is less than the number of observations, the dimensionality can be realized.
4. PCA (principal component analysis, principal component analyses)
4.1 Performance evaluation
Pros: Reduce the complexity of your data and identify the most important features
Cons: Not necessarily required, and may lose useful information
4.2 PCA Implementation
The pseudo-code that transforms the data into the first n principal components is as follows:
Remove average (subtract the average of each dimension of the data)
Computes the covariance matrix of the matrix that is composed of data
Calculating eigenvalues and eigenvectors of covariance matrices
Sort a feature value from large to small
Preserves the top n eigenvectors
Convert data to a new space constructed of the n eigenvectors above
The implementation code is as follows:
<span style= "FONT-SIZE:18PX;" ><span style= "FONT-SIZE:18PX;" >def PCA (datamat,topnfeat = 999999): #每一行对应一个数据点, the average of each column (that is, the mean of each feature) Meanval = mean (Datamat,axis = 0) #数据均值化meanData = datamat-meanval# covariance matrix for data Covmat = CoV (Meandata,rowvar = 0) #协方差矩阵的特征向量和特征值eigVal, Eigvec = Linalg.eig (Mat (Covmat)) Eigvalindex = Argsort (eigval) Eigvalindex = eigvalindex[:-(topnfeat+1): -1]redeigvec = eigvec[:,eigvalindex]# Get the data after dimensionality Lowdatamat = Meandata * redeigvec# The original data is refactored back for debugging Reconmat = (Lowdatamat * redeigvec.t) + Meanvalreturn Lowdatamat, Reconmat</span></span>
4.3 refactoring
The process of restoring the original data is the process of obtaining the estimated location in the original space after the sample point mapping.
Inspection: Enter the following code on the command line: Draw the data after the dimensionality and the original data. Get the picture as follows:
<span style= "FONT-SIZE:18PX;" >>>> Lowdata,recon = PCA.PCA (datas,1) [[1.]] >>> import Matplotlib.pyplot as plt>>> fig = plt.figure () >>> ax = fig.add_subplot (111) > >> Ax.scatter (Datas[:,0].flatten (). A[0],datas[:,1].flatten (). A[0],marker= ' ^ ', s=90) <matplotlib.collections.pathcollection object at 0x7f8b84d70590>>>> ax.scatter (Recon[:,0].flatten (). A[0],recon[:,1].flatten (). A[0],marker= ' o ', s=50,c= ' red ') <matplotlib.collections.pathcollection object at 0x7f8b84d40290>>>> Plt.show () </span>
The red line indicates that the original data is projected along the line. That is, the first new axis.
Reference:
(1) PCA Mathematical principle: http://www.360doc.com/content/13/1124/02/9482_331688889.shtml
The PCA for machine learning combat