Linear Discriminant Analysis-LDA-Linear Discriminant Analysis

Source: Internet
Author: User

1. What is lda?

Linear Discriminant Analysis (LDA. Fisher Linear Discriminant (linear) is a classic algorithm for pattern recognition. In 1996, belhumeur introduced Pattern Recognition and AI.

The basic idea is to project a high-dimensional Pattern sample to the optimal identification vector space to extract classification information and compress feature space dimensions, after projection, the pattern sample has the largest Inter-class distance and the smallest intra-class distance in the new sub-space, that is, the pattern has the best separation in the space.

Lda and the PCA mentioned earlier are common dimensionality reduction technologies. PCA mainly finds a better projection method from the perspective of feature covariance. Lda is more concerned with annotation, that is, the distance between data points of different categories after projection is larger, and the data points of the same category are more compact.

The following is an example to describe the goal of lda.

You can see two categories, one green and the other red. The left figure shows the original data of two categories. Now we need to reduce the data from two dimensions to one dimension. Directly projected to the X1 or X2 axis, there will be duplicates between different categories, resulting in decreased classification effect. The line mapped to the right is calculated using Lda. We can see that the distance between the red and green categories is the largest after the ing, in addition, the degree of discretization of points within each category is the smallest (or the degree of aggregation is the largest ).

Here is an LDA example:

Is an example of difference between Lda and PCA:

The class1 vertex is a circle, and the class2 vertex is a cross. There are two straight lines in the figure. The line with a slope of around 1 is the ing line selected by PCA, and the line with a slope of around-1 is the LDA ing line selected by LDA. The remaining vertices that are not in these two straight lines are original data points. We can see that because LDA considers the information of "category" (that is, annotation), after ing, the points of class1 and class2 can be well separated.

2. LDA instructions

First, what is the dimension after dimensionality reduction?

PCA dimensionality reduction is directly related to the data dimension. For example, if the original data is n-dimensional, you can select one or two dimensions as needed after PCA, all the way to n-dimensional (of course those with large feature values ). LDA dimension reduction is directly related to the number of categories and has no relationship with the data dimension. For example, if the original data is n-dimensional and there are a total of C categories, after LDA dimension reduction, generally is the 1 dimension, 2 dimension to the C-1 dimension selection (of course, the corresponding feature value is also the largest), for example, assuming image classification, two categories of positive examples inverse examples, each image has a 10000 dimension feature, so after LDA, there is only one dimension feature, and the classification capability of this dimension feature is the best.
PS: in many cases of two types of classification, LDA is left with one dimension, and it seems that you can find the best threshold value for classification.

Second, whether the coordinate system of the projection is orthogonal

The coordinate system of PCA projection is orthogonal, while LDA focuses on the classification capability based on the classification annotation. Therefore, it is not guaranteed that the projected coordinate system is orthogonal (generally not orthogonal ).

3. LDA calculation process (two categories)

This section mainly discusses only two types of data, LDA calculation. Now we need to find a vector w, project the data x to w, and obtain the new data y. First, in order to realize the distance between the two categories after projection, the absolute value of the mean difference between the two categories after ing is used for measurement. Second, to implement projection, data points within each category are compared and aggregated, and the variance of each category after projection is used for measurement.

Mean of Category I:

The mean value after Category I projection (actually equivalent to the projection of mi ):

Absolute Value of mean difference after projection:

Variance after projection (here, y is the data after projection in Category I, that is, y = w_t * x ):

The target optimization function is:

The following defines S_ B and S_W by expanding m' and s:

The optimization goal J (w) is rewritten as follows to facilitate the export of the calculation method w.

The derivation process is ignored. The derivation result is as follows:

Assume that the data is an n-dimensional feature, m data, and the number of classifications is 2. So Sw is actually the sum of the covariance matrix of each class, the covariance matrix of each class is n * n, so Sw is n * n, M1-M2 is n * 1. The calculated w is n * 1, that is, w maps dimension features to one dimension.

PS: there is no need to tangle with the covariance matrix form of Sw. In fact, this is the result of splitting w and w_t. In fact, after w * Sw * w_t, it is still a numerical value, that is, the sum of the projection variance of the two classes.

4. LDA calculation process (multiple categories)

For S_w, the sum of covariance matrices of two classes is changed to the sum of covariance matrices of multiple classes ".

For S_ B, it used to be the absolute value of the mean difference between two classes. How can we calculate multiple classes now? Calculate the sum of the absolute values of the mean difference between any two classes? In this way, the calculation of N classes requires C (N, 2) times. This may be a method, but the LDA method is used to calculate the difference between the mean of each category and the mean of all classes, the data volume of each category is weighted. In the following formula, m is the mean of all classes, m_ I is the mean of Category I, and n_ I is the data volume of Category I.

For n-dimensional features, C categories, m sample data, is to map n-dimensional data to the C-1 dimension. That is, the requested w is a matrix of n * (c-1. S_w is a matrix of n * n (not divided by the sum of the covariance matrix of the number of samples), S_ B is a matrix of C * C, in fact, the rank of S_ B matrix is a maximum of C-1, this is because n_ I * (m_i-m) the c vectors are actually linearly correlated because their sum is a constant multiple of mean m. This leads to the following solution w, in fact, is to find a C-1 vector of a composition of a w.

Ignore the specific calculation.

The following is an example of three categories:

4. Other LDA variants

If the original data cannot be separated after projection, Kernel LDA is a solution.

LDA is related to the data dimension in terms of calculation workload. 2DLDA can greatly reduce the calculation workload of LDA.

5. LDA Problems

First, LDA projects a maximum of C-1 dimensions. If more features are needed, other methods will be introduced.

Second, LDA assumes that the data follows a single-peak Gaussian distribution, such as the complex data structure below.

5. Reference

Introduction to LDA
Linear Discriminant Analysis-A Brief Tutorial
Http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_fisher_discriminant.htm
Linear Discriminant Analysis (LDA) algorithm Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.