discriminant analysis (discriminant) is a classification technique. It uses a known class of "training samples" to establish the criteria and to classify the unknown categories of data by predictor variables. There are three kinds of discriminant analysis methods, namely Fisher Discriminant, Bayes discriminant and distance discrimination.
- Fisher Discriminant thought is the projection dimensionality reduction, so that multidimensional problems can be simplified to one-dimensional problems to deal with. Select an appropriate projection axis so that all the sample points are projected onto this axis to get a projected value. The requirement for the direction of the projection axis is that the dispersion within the group formed by the projected values in each group is as small as possible, while the projected values between the different groups are as large as possible between classes.
- Bayes discriminant theory is based on a priori probability to find the posterior probability, and based on the post-test probability distribution to make statistical inference.
- Distance discriminant thinking is based on the known classification of the data to calculate the various kinds of center of gravity, the unknown classification of the data, calculate its distance from all kinds of center of gravity, and a certain center of gravity distance is attributed to this class
Linear discriminant Analysis (Linear discriminant, LDA) is a classical algorithm for pattern recognition, which was introduced in the field of pattern recognition and artificial intelligence in the 1996 by Belhumeur. The basic idea of LDA is to project the high-dimensional pattern sample to the best discriminant vector space, in order to achieve the effect of extracting the classified information and compressing the spatial dimension of feature space, and then ensuring that the model sample has the largest class distance and the smallest intra-class distance in the new subspace, that is, the model has the best separable in the space.
Feature selection (i.e. dimensionality reduction) is a very important step in data preprocessing. For classification, feature selection can select the most important features of the classification from a wide range of features, removing the noise from the original data. Principal component Analysis (PCA) and linear discriminant analysis (LDA) are two of the most commonly used feature selection algorithms. But their goal is basically the opposite, as shown below is the difference between LDA and PCA.
- Start thinking differently. PCA is mainly from the covariance angle of features, to find a better projection mode, that is, the selection of the sample point projection has the largest variance direction, and LDA is more to consider the classification of the label information, looking for the projection after the different categories of data points between the greater distance and the same category of data points to minimize the distance, That is, the best way to select the classification performance.
- Learning patterns are different. PCA is unsupervised learning, so most scenarios are only part of the data processing process and need to be used in combination with other algorithms, such as PCA and clustering, discriminant analysis, regression analysis and so on; LDA is a supervised learning method, which can be used to predict the application in addition to dimensionality reduction. Therefore, it can be combined with other models and can be used independently.
- The number of available dimensions is different after dimensionality reduction. LDA can generate up to C-1 subspace space (category labels-1), so LDA has nothing to do with the number of original dimensions, only the number of data label classifications, and PCA is available in up to n dimensions, that is, the maximum available dimensions can be selected.
In the same example, the two methods of dimensionality reduction are very intuitive and result in different comparisons:
Linear discriminant Analysis The LDA algorithm has been widely used in many fields because of its simple validity, and it is a classic and popular algorithm in machine learning and data mining field. But the algorithm itself still has some limitations:
- When the sample number is much smaller than the characteristic dimension of the sample, the distance between samples and sample becomes larger, the distance metric is invalidated, so that the discrete matrix of the intra-class and inter-classes in the LDA algorithm is singular and the optimal projection direction cannot be obtained, especially in the field of face recognition.
- LDA is not suitable for dimensionality reduction of non-Gaussian distribution samples
- LDA does not work well when the sample classification information depends on the variance instead of the mean value
- LDA may over-fit data
Application Scenarios for LDA:
- dimensionality reduction or pattern recognition in human face recognition
- Economic forecasts based on the macroeconomic characteristics of the market
- Market research based on market or user's different attributes
- Predicting medical conditions based on patient case characteristics
Mass::lda
The
r uses the LDA function of the mass package for linear discriminant. The LDA function is based on Bayes discriminant theory. Bayes discriminant is equivalent to Fisher discriminant and distance discrimination when the classification has only two kinds and the population obeys multivariate normal distribution. code example:  
> if (Require (MASS) == FALSE) +< Span style= "COLOR: #000000" > { + install.packages ( " mass " ) +} > > Model1=lda (Species~.,data=iris) > table <-table (iris$species,predict (model1) $class ) > table Setosa versicolor virginica setosa 50 0 0 versicolor 0 2 virginica 0 1 49> sum (diag (prop.table (table))) ## #判对率 [1] 0.98
as a result, only three of the samples were observed to be judged incorrectly. After the discriminant function is established, the discriminant score can be plotted similar to the principal component analysis.
> LD <-predict (Model1) $x #表示映射到模型中的向量上的值; score value > DS <-Cbind (Iris,as.data.frame (LD))>Head (DS) sepal.length sepal.width petal.length petal.width species LD1 LD21 5.1 3.5 1.4 0.2 setosa 8.061800 0.30042062 4.9 3.0 1.4 0.2 setosa 7.128688-0.78666043 4.7 3.2 1.3 0.2 setosa 7.489828-0.26538454 4.6 3.1 1.5 0.2 setosa 6.813201-0.67063115 5.0 3.6 1.4 0. 2 Setosa 8.132309 0.51446256 5.4 3.9 1.7 0.4 setosa 7.701947 1.4617210> p=ggplot ( ds,mapping = AES (x=ld1,y=LD2))> P+geom_point (AES (colour=species), alpha=0.8,size=3)
And look at a group of predictive data based on principal components.
> Model2 <-lda (species~ld1+ld2,ds)> table (iris$species,predict (MODEL2) $Class) Setosa versicolor virginica setosa 0 0 versicolor 0 2 virginica 0 1 49
when the covariance matrices of different class samples are not the same, two discriminant should be used. Note When using the LDA and QDA functions: The hypothesis is that the population obeys a multivariate normal distribution and, if not satisfied, uses two times of discretion.
> Iris.qda=qda (species~.,data=iris,cv=T)> Table<-table (iris$species,predict (Iris.qda,iris) $ class )> table setosa versicolor virginica setosa 0 0 versicolor 0 2 virginica 0 1 49> sum (diag (prop.table (table)) # ## # Rate of Judgement [1] 0.98
The CV parameter is set to T, which is used to leave a cross-check (Leave-one-out cross-validation), and the predicted value is automatically generated. The confusion matrix generated under this condition is more reliable. You can also use the Predict (model) $posterior to extract the posteriori probabilities
ML: Descending dimension algorithm-lda