"Reprint" Linear discriminant analysis (Linear discriminant analyses) (ii)

Source: Internet
Author: User

Linear discriminant Analysis (Linear discriminant Analyst) (ii)4. Example

The spherical sample points on the 3-dimensional space are projected onto two dimensions, and W1 can achieve better separation than W2.

Comparison of the dimensionality reduction between PCA and LDA:

The PCA selects the sample point projection with the direction of the maximum variance, and LDA chooses the best way to classify the performance.

Since LDA is called linear discriminant analysis, it should have some predictive functions, such as a new sample x, how to determine its category?

With a value of two, we can project it to a straight line, get Y, and see if Y is y0 over a certain threshold, more than a certain class, or another class. And how do you find this y0?

See

According to the central limit theorem, the stochastic variable with independent distribution and the Gaussian distribution are conformed, then the maximum likelihood estimation is used to find

Then use the formula in the decision theory to find the best y0, see PRML for details.

This is a feasible but cumbersome selection method, you can see the 7th section (some questions) to get a simple answer.

5. Some limitations of using LDA

1. LDA can generate up to C-1 subspace space

The dimension interval of LDA descending dimension is in [1,c-1], independent of the original characteristic number n, and for the binary classification, it is projected up to 1 dimensions.

2, LDA is not suitable for non-Gaussian distribution samples to reduce dimensionality.

The middle red area represents a class of samples, and the blue area represents another class, and because it is a class 2, it is projected up to 1 dimensions. No matter how it is projected on a straight line, it is difficult to condense the red dots and the blue dots, separating the classes from each other.

3. LDA is not effective when it relies on variance instead of mean value in sample classification information.

, the sample points are categorized by variance information rather than mean information. LDA is not able to classify effectively because LDA relies heavily on mean information.

4. LDA may over-fit data.

6. Some variants of LDA

1. Non-parametric LDA

Nonparametric LDA uses local information and K near the sample points to calculate, so that it is full-rank, so that we can extract the extra C-1 eigenvectors. And the separation effect is better after projection.

2. Orthogonal LDA

Find the best eigenvector first, and then find the vector that is orthogonal to the eigenvector and maximizes the fisher condition. This method also can get rid of C-1 limit.

3. Generalized LDA

Introduced the theory of Bayesian risk, etc.

4. Kernel function Lda

Use the kernel function to calculate the feature.

7. Some issues

The above is used in multi-value classification

is a hash matrix with weights of various sample centers to the full sample center. If c=2 (i.e., two value classification) applies this formula, it cannot be used in binary classification.

Therefore, the two-valued classification and the multi-value classification will be different, but the meaning is consistent.

For the two-valued classification problem, it is surprising that the least squares and Fisher linear discriminant analysis are consistent.

Here we prove the conclusion, and give the y0 of the 4th section is worth selecting the problem.

Review the previous linear regression, a training sample given n D-dimensional features (I from 1 to n), each corresponding to a class label. We have previously made y=0 a class, and Y=1 represents another class, and now we need to make some changes to prove the relationship between least squares and LDA.

is to replace 0/1 with a value.

We list the least squares formula

W and is the fit weight parameter.

respectively, and W.

From the first formula, you can get a

After the elimination of the Yuan,

It can be proved that the second equation is expanded and the following formula is equivalent

This is the same as the formula in the two value category.

Because

Therefore, the final result is still

This process is understood in terms of geometric meanings, i.e., the linear regression after deformation (redefining the class label), and the straight line direction after the linear regression is the linear direction that LDA obtains in the binary classification.

Well, we can see from the definition of the changed y that y>0 belongs to class, Y<0 belongs to class. So we can choose Y0=0, that is, if, it is a class, otherwise it is a class.

Wrote a lot of, quite miscellaneous, there is a topic model is also called LDA, but the name is called latent Dirichlet Allocation, the second author is Andrew Ng Daniel, the last of his mentor Jordan, when read and then write a summary of the post.

"Reprint" Linear discriminant analysis (Linear discriminant analyses) (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.