Fisher Linear Discriminant Analysis (Fisher Linear Discriminant Analysis)

Source: Internet
Author: User

From: http://blog.csdn.net/warmyellow/article/details/5454943

I. lda algorithm Overview:

Linear Discriminant Analysis (LDA), also known as Fisher linear discriminant, is a classic algorithm for pattern recognition, it introduced Pattern Recognition and AI by belhumeur in 1996. The basic idea of qualitative discriminant analysis is to project a high-dimensional Pattern sample to the optimal identification vector space to extract classification information and compress the dimension of the feature space, after projection, the pattern sample has the largest Inter-class distance and the smallest intra-class distance in the new sub-space, that is, the pattern has the best separation in the space. Therefore, it is an effective feature extraction method. This method maximizes the inter-class scatter matrix of the projected sample and minimizes the intra-class scatter matrix. That is to say, it can ensure that the projection mode sample has the smallest class distance and the maximum class distance in the new space, that is, the pattern has the best severability in the space.

Ii. lda assumptions and symbol descriptions:

Assume that m samples in a space are x1, x2 ,...... XM indicates that each X is a matrix of N rows, which indicates the number of samples belonging to Class I. Suppose there is a class C, then.

.................................................................................... Class separation and divergence Matrix

.................................................................................... Class discretization Matrix

.................................................................................... Number of samples belonging to Class I

....................................................................................... Sample I

....................................................................................... Mean of all samples

....................................................................................... Sample Mean of Class I

Iii. formula derivation and algorithm Formal Description

According to the symbol, the sample mean of Class I is:

.............................................................................. (1)

Similarly, we can also get the mean value of the overall sample:

.................................................................................... (2)

According to the definition of the class degree matrix and class degree matrix, the following formula can be obtained:

...................................................... (3)

.......................................... (4)

Of course, there is another way to express the discrete degree matrix within a class:

This refers to the prior probability of a Class I sample, that is, the probability () of the class I in the sample, which is substituted into the second group of formula, we can find that the first group of formulas only multiply by 1/m less than the second group. We will discuss it later. In fact, for multiplication Without multiplying this 1/m, it does not affect the algorithm itself. Now let's analyze the idea of the algorithm,

We can know that the actual meaning of a matrix is a covariance matrix, which depicts the relationship between the class and the sample population, the functions on the diagonal of the matrix represent the variance (dispersion) of the class relative to the population ), the elements on the non-diagonal lines represent the covariance of the population mean of the class (that is, the correlation or redundancy between the class and the Population). Therefore, according to formula (3), we can see that (3) formula: Calculate the sum of the covariance matrix between the sample and the population based on the class to which the samples belong. This describes the degree of discrete redundancy between all classes and the population. Likewise, we can conclude that (4) is the sum of the covariance matrices between samples and classes in the classification, it depicts the degree of discretization between samples and classes in the class (the class property described here is composed of the mean Matrix of each sample in the class, in fact, we can see that both the sample expectation matrix in the class and the overall sample expectation matrix act as a medium, both in-class and out-of-class divergence matrices depict the discretization of samples between classes and between samples within the class from a macro perspective.

As a classification algorithm, lda certainly requires a low Coupling Degree between classes and a high degree of aggregation within classes. That is, the values in the class discretization matrix are small, the value in the matrix between classes and divergence should be large, so the classification effect is good.

Here we introduce the fisher identification criterion expression:

..................................................................... (5)

Which is any n-dimensional column vector. Fisher Linear Discriminant Analysis selects the vector that reaches the maximum value as the Projection Direction. Its physical meaning is that the projected sample has the largest class degree of inter-class divergence and the smallest class discretization.

We can substitute formula (4) and formula (3) into formula (5) to obtain:

We can set a matrix as a space, that is, the projection of a low-dimensional space (hyperplane. It can also be expressed as, while when the sample is a column vector, it represents the square of the Geometric Distance in space. Therefore, the numerator of the Fisher Linear Identification Analysis expression is the sum of squares of the Geometric Distance between classes in the projection space, similarly, the denominator can also be introduced as the square difference of the Geometric Distance of the sample in the class under the projection space, so the classification problem is converted to finding a low-dimensional space so that the sample is projected into the space, the projected sum of distance and sum of distance between classes has the largest ratio to the sum of intra-class distance, that is, the optimal classification effect.

Therefore, based on the above idea, we can find a projection matrix composed of an optimal identification vector by optimizing the following criterion functions (here we can also see that 1/m can be reduced by the molecular denominator, so the first group formula mentioned above has the same effect as the second group formula ).

.................. (6)

It can be proved that when the LDA algorithm is implemented, a dimensionality reduction of the PCA algorithm is performed on the sample to eliminate the redundancy of the sample, thus ensuring non-singular arrays, of course, even if it is a singular array, it can be solved. We will not discuss it here, if it is not a singular situation), the column vector of the optimal projection matrix is just a generalized feature equation.

.................................................................................... (7)

The number of feature vectors (feature vectors of the matrix) corresponding to the D largest feature values, and the number of optimal projection axes D <= C-1.

According to the formula (7), you can launch ...................................................... (8)

Because

Verification is given below: (7) can be obtained by substituting (6:

 

 

 

Iv. Physical Meaning and consideration of Algorithms

4.1 use an example to illustrate the spatial significance of LDA Algorithms

The following is a classification problem using LDA: Assume that a product has two parameters to determine whether it is qualified,

We assume that the two parameters are:

Parameter

Parameter B

Qualified?

2.95

6.63

Qualified

2.53

7.79

Qualified

3.57

5.65

Qualified

3.16

5.47

Qualified

2.58

4.46

Unqualified

2.16

6.22

Unqualified

3.27

3.52

Unqualified

Experimental Data source: http://people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html

Therefore, we can divide the samples into two categories based on the table: one is qualified and the other is unqualified. Therefore, we can create two dataset classes:

Cls1_data =

2.9500 6.6300

2.5300 7.7900

3.5700 5.6500

3.1600 5.4700

 

Cls2_data =

2.5800 4.4600

2.1600 6.2200

3.2700 3.5200

 

Among them, cls1_data is a qualified sample and cls2_data is an unqualified sample. We can calculate the expected values of qualified samples based on formula (1) and (2), and the qualified values of nonconforming samples, and total sample expectations:

E_cls1 =

3.0525 6.3850

E_cls2 =

2.6700 4.7333

E_all =

2.8886 5.6771

We can make the positions of the current sample points:

Figure 1

The Blue Points '*' represent unqualified samples, while the red points represent qualified samples. The inverted triangle of the sky blue represents the total expectation, and the blue triangle represents the expectation of unqualified samples, A red triangle represents the expectation of a qualified sample. From the Euclidean direction of X and Y axes, we can see that the discrimination of qualified and unqualified samples is not good.

Based on Expressions (3) and (4), we can calculate the class degree matrix and class degree matrix:

SB =

0.0358 0.1547

0.1547 0.6681

Sw =

0.5909-1.3338

-1.3338 3.5596

We can calculate the feature value and the corresponding feature vector based on formula (7) and (8:

L =

0.0000 0

0 2.8837

The diagonal line is the feature value. The first feature value is too small and the computer is about 0.

The corresponding feature vector is

V =

-0.9742-0.9230

0.2256-0.3848

According to the feature vector corresponding to the largest feature value: (-0.9230,-0.3848), this vector is the required sub-space, we can project the original sample to this vector and get a new space (two-dimensional projection to one-dimensional, should be a number)

New_cls1_data =

 

-5.2741

-5.3328

-5.4693

-5.0216

It is the sample value after the pass sample projection.

New_cls2_data =

-4.0976

-4.3872

-4.3727

For the sample values after the projection of unqualified samples, we found that after the projection, the classification effect is obvious, and the degree of polymerization between classes and classes is very high. We plot it again to make it more intuitive to see the classification effect.

Figure 2

The blue line is the feature vector corresponding to a small feature value, and the sky blue is the feature vector with a large feature value. The Blue Circle points are the locations where unqualified samples are projected from the feature vector, the dataset after the pass sample projection of the red '*' symbol shows that the classification effect is better (of course, due to the problem of X and Y axis units, the projection is not so intuitive ).

We use the feature vectors we get to determine other samples and see the type of the samples.

(2.81, 5.46 ),

We projected it into the feature vector and obtained: Result =-4.6947, so it should belong to an unqualified sample.

4.2 LDA algorithm and PCA algorithm

Based on the traditional feature face method, the researchers noticed that feature vectors (feature faces) played by feature values must be the best classification performance direction, and for K-L transformation, the differences between the images produced by external factors and the faces themselves cannot be distinguished. The feature connection reflects the differences in illumination to a large extent. Research shows that the recognition rate decreases sharply with the introduction of light, angle, face size, and other factors. Therefore, there are still theoretical defects in face recognition. The feature vector set extracted by linear discriminant analysis emphasizes the differences between different faces rather than the changes in facial expressions, lighting conditions, and other conditions, which helps improve the recognition effect.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.