Feature extraction and feature selection are two methods of dimensionalityreduction (dimensionality reduction), but the two have the same point and have different points:
1. Concept:
feature extraction (Feature Extraction): creatting A subset of new features by combinations of the exsiting features. In other words, after the feature extraction A feature is a mapping of the original feature.
Feature Selection (Feature Selection): Choosing a subset of all the features (the ones more informative). In other words, feature selection is a subset of the original feature.
2. similarities and differences in feature selection and feature extraction are similar, and the effect is the same as trying to reduce the number of attributes (or features) in a feature dataset But the way of the two methods is different: The method of feature extraction is mainly through the relationship between attributes, such as the combination of different attributes of the new properties, so that the original feature space is changed, and the method of feature selection is to select a subset from the original feature data set, is a contained relationship, not change the original feature space.
3. Feature extraction:
Principal component Analysis (Principle, PCA) and linear evaluation analysis (Linear discriminant Analysis,lda) are two of the main classical methods for feature extraction .
1.. PCA V.s LDA
For feature extraction, there are two categories:
(1) Signal representation (signal indication): The goal of the feature extraction mapping is to represent the samples accurately in a l Ow-dimensional space. In other words, feature extraction features to be able to accurately represent the sample information, so that the loss of information is very small. The corresponding method is PCA.
(2) Signal classification (signal classification): The goal of the feature extraction mapping is toenhance the class-discriminatory informat Ion in a low-dimensional space. That is to say, feature extraction after the characteristics, to make the classification of the accuracy is very high, can not be compared with the original characteristics of the accuracy of the classification is low. In terms of linearity, the corresponding method is LDA. The nonlinearity is not considered here for the time being.
It can be seen that the objectives of PCA and LDA two methods are different, and therefore their methods are different. The projection space obtained by PCA is the eigenvector of covariance matrix, and LDA is a transformation W, which makes the difference between the new mean value and the maximum variance (that is, maximizing the distance between classes and minimizing intra-class distance), and the transformation w is the projection direction of the feature.
4. Feature selection:
A correct mathematical model should be simple in form. The purpose of constructing the machine learning model is to be able to learn the structure and the nature of the problem from the original characteristic data, of course, the selected features should be able to explain the problem better, so the target of feature selection is as follows:
Improved prediction accuracy constructs faster, consumes lower predictive models to better understand and interpret feature selection methods
There are three main methods:
4.1.1, Filter method
The main idea is: to each dimension of the characteristics of "scoring", that is, each dimension of the characteristics of the weight, such a weight represents the importance of the dimension features, and then according to the weight of the order.
The main methods are:
Chi-squared test (Chi-square test) information gain (information gain) correlation coefficient scores (correlation coefficient)
4.1.2, wrapper method
The main idea is to consider the choice of subset as a search optimization problem, to generate different combinations, to evaluate the combinations, and to compare them with other combinations. In this way, the choice of subset is considered to be an optimization problem, there are many optimization algorithms can be solved, especially some heuristic optimization algorithms, such as GA,PSO,DE,ABC,
The main methods are: Recursive feature elimination algorithm (recursive feature elimination algorithm)
4.1.33, embedded method
The main idea is to learn the best properties to improve the accuracy of the model when the model is established. This sentence is not very well understood, in fact, in the process of determining the model, the selection of those who are important to the training of the model of the attributes.
The main methods: regularization, the ridge regression is the basic linear regression in the process of adding a regular term.
5. Summary
Feature selection is different from feature extraction, the characteristics and models are inseparable, and the model chosen by different features is different. Under the framework of machine learning = model + strategy + algorithm, feature selection is part of model selection and is inseparable.
For grouping first or feature selection first, the answer is to group first, because the purpose of cross-validation is to make a model selection, since feature selection is part of the model selection, then it is rightfully to group first. If feature selection is selected first, timing is selected throughout the data set, so that the selected subset is random.
We can take regularization for example, regularization is a weight constraint, so the constraint parameters are determined during the training process of the model, rather than predetermined and then cross-validated.