Original: http://sebastianraschka.com/Articles/2014_python_lda.html
Compiling: Asher Li
Note 1: In this paper, the linear algebraic term "eigenvalue" and "eigenvector" are presented, and the Chinese textbooks have two common translation methods of "eigenvalue" and "intrinsic value". In order to differentiate with "feature", this article uses "intrinsic" translation.
Note 2: The paper mentions "Kxd k \times D-Wieben vector Matrix", the original writing "Kxd K \times d-dimensional Matrix", which refers to "the matrix of the intrinsic vector of K-D dimension".
The combination of Chinese and formula may cause ambiguity, so this is explained separately. Introductory
In the pattern classification and machine learning practice, linear discriminant analysis (Linear discriminant analyzing, LDA) method is often used to reduce dimension (dimensionality reduction) steps in data preprocessing. In order to avoid the fitting ("Dimension Disaster"), LDA reduces the computational consumption while the data set is projected into the lower dimension space under the premise of guaranteeing good classification degree.
Ronald Fisher presented a linear discriminant (Problems Linear) method in 1936 (the "Use of multiple measurements in taxonomic discriminant"), which is sometimes used to solve classification questions Problem. The initial linear discriminant applied to two classification problems, later in 1948, by C. R. Rao (The utilization of multiple measurements in problems of biological classification) is extended to "multiple linear discriminant analysis" or "Multiple discriminant analysis".
In General, LDA is very similar to principal component analysis (Principal Component ANALYSIS,PCA), but unlike PCA, which seeks to maximize the variance of all the data, LDA is concerned with the ability to maximize the degree of distinction between classes of axis components.
For more information on PCA, refer to Implementing a Principal Component analysis (PCA) into Python step by step.
In a nutshell, LDA projects the feature space (multidimensional samples in the dataset) to a smaller K-K Wizi space (k≤n−1 K \le n-1) while maintaining classified information.
Under normal circumstances, dimensionality reduction not only reduces the computational load of the classification task, but also reduces the error of parameter estimation, thus avoiding the fitting. principal component Analysis vs. linear discriminant analysis
Linear discriminant Analysis (LDA) and principal component Analysis (PCA) are commonly used in the linear transformation dimension reduction method. PCA is a "unsupervised" algorithm, it does not care about category labels, but is dedicated to finding those directions (i.e. "principal components") where the variance is maximized in the dataset, while LDA is "supervised", which computes another specific direction (or "linear discriminant")-- These directions depict the axes that maximize the degree of distinction between classes.
Although intuitively it sounds like LDA is superior to PCA for multiple classification tasks in known class information, it is not necessarily true.
For example, the classification accuracy of image recognition task after PCA or LDA is compared, if the sample number is less, the PCA is better than LDA (PCA vs. LDA, a.m. Martinez et al., 2001). Combined use of LDA and PCA is not uncommon, for example, when dimensionality reduction is first done with PCA.
What is the "good" feature subspace
Assuming that our goal is to project a D D-dimensional dataset into a K K (k<d K) Wizi space to reduce its dimensions, how do we know how much k K to choose? How do we know if this feature space can be very "good" to express our data.
At a later date, we compute the intrinsic vector (composition) of the dataset and belongs it to a so-called "scatter matrix" (inter-class scatter matrix and class-Inside scatter matrix). Each intrinsic vector corresponds to an intrinsic value, and the intrinsic value tells us the "Length"/"size" of the corresponding intrinsic vector.
If all the eigenvalues are similar in size, then this means that our data has been projected onto a "good" feature space.
And if some of the eigenvalues are much larger than the other eigenvalues, we might consider leaving only the biggest ones, because they contain more information about the distribution of the data. Conversely, nearly 0 of the eigenvalues contain less information, and we consider discarding them when constructing new feature spaces. Five steps of Lda
The following is a list of five basic steps for performing a linear discriminant analysis. We'll do a more detailed explanation in the back.
1. D-D mean vector for calculating different classes of data in a dataset.
2. Compute the scatter matrix, including inter-class, within-Class scatter matrix.
3. Compute the intrinsic vector of the scatter matrix E1,e2,..., Ed e_1, e_2, ..., e_d and corresponding eigenvalues λ1,λ2,..., λdλ_1,λ_2,..., λ_d.
4. The intrinsic vectors are sorted in descending order of intrinsic value, and then K K is selected. The intrinsic vector corresponding to the maximal eigenvalue, a dxk d \times k-dimensional matrix is formed-that is, each column is an intrinsic vector.
5. Use this dxk d \times k-Wieben vector matrix to transform the sample to a new subspace. This step can be written in matrix multiplication y=xxw Y = X \times W. x x is a nxd n \times d-dimensional matrix that represents n n samples, and Y y is a nxk n \times k-dimensional sample transformed to a subspace. preparing a sample dataset Iris Set
The next step is to toss about the famous "Iris" DataSet. The Iris collection can be downloaded from the UCI Machine Learning catalogue:
Https://archive.ics.uci.edu/ml/datasets/Iris
Reference: Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, ca:university of California, School of information and Computer science.
The set of three kinds, 150 iris were measured.
The three categories include:
1. Mountain Iris (N=50)
2. Color-changing Iris (N=50)
3. Virginia Iris (N=50)
Four features include:
1. Sepals Length (cm)
2. Sepals width (cm)
3. Petal Length (cm)
4. Petal width (cm)
Feature_dict = {I:label for I,label in Zip (
range (4),
(' sepal length in cm ',
' sepal width in cm ',
' petal l Ength in cm ',
' petal width in cm ',)}
reading data sets
import Pandas as PD df = pd.io.parsers.read_csv (filepath_or_buffer= ' https://archive.ics.uci.edu/ml/machine-lear Ning-databases/iris/iris.data ', Header=none, sep= ', ', ' df.columns = [l for i,l in Sorted (Feature_dict.items () ] + [' class label '] Df.dropna (how= "All", Inplace=true) # to drop the empty line at File-end Df.tail ()
|
sepal length in cm |
sepal width in cm |
petal lengt h in cm |
pedal width in cm |
class label |
145 |
6.7 |
3.0 |
5.2 |
2.3 |
iris-virginica |
146 |
6.3 |
2.5 |
5.0 |
1.9 |
iris-virginaica |
147 |
6.5 |
3.0 |
5.2 |
2.0 |
iris-virginaica |
14 8 |
6.2 |