The amount of distance on a blog has been a long time, has been busy to do a job, recently finished, or to write blog ah. A lot of basic knowledge some forgotten, also counted as a kind of review. I try to derive the key place to write, suggesting that you still want to manually push a formula to increase understanding.
Linear discriminant Analysis (also known as Fisher Linear discriminant) is a supervised (supervised) linear dimensionality reduction algorithm. Unlike PCA, which maintains data information, LDA is designed to make the data points after the dimensionality as easy to distinguish as possible.
Assuming the original data is represented as X, (M*n matrix, M is dimension, n is the number of sample)
Since it is linear, it is desirable to find the mapping vector A, so that the data points after a ' x can maintain the following two properties:
1, similar data points as close as possible (within class)
2. The data points of different classes are as separate as possible (between class)
So, the last time the PCA used this diagram, if the two stacks of points in the diagram are two classes, then we want them to be able to project to the Axis 1 (the PCA result is axis 2), so in one-dimensional space is also very easy to distinguish.
Next is the derivation, because here the formula is very inconvenient, I quoted Deng Cai Teacher of a PPT in a small section of the picture:
The idea is still very clear, the objective function is the last line J (a), μ (a float) is the map of the center to evaluate the distance between classes, S (one scoop) is the map of the point and the center of the sum of the distances used to evaluate the class spacing. J (a) is precisely the evolution from the above two properties.
Therefore, in two categories of cases:
Plus the condition of a ' a=1 (similar to PCA)
Can be expanded into multiple categories:
The above formula derivation can be specific reference to the corresponding section of the pattern classification book, talking about Fisher Discirminant
OK, the calculation of the mapping vector A is to find the maximum eigenvector, can also be the first few of the largest eigenvectors of the matrix A=[A1,A2,.... AK], the new points can be reduced dimension:
y = A ' X
(one of the benefits of linearity is ease of calculation.) )
It can be found that LDA is finally transformed into a matrix eigenvector problem, and PCA very much like, in fact, many other algorithms are also due to this class, commonly referred to as the spectral (spectral) method.
Linear dimensionality reduction algorithm I think the most important thing is PCA and LDA, and then introduce some nonlinear methods.