Reference documents
Feature extraction is the preparation of machine learning.
First, the characteristics are broadly divided into several
Some points: High features and low features. High features refers to a more generalized feature; Low features refers to a relatively specific characteristic.
Some points: specific features, primitive features (raw), abstract features.
Overall, the low level is more targeted, the individual feature coverage is small (the data containing this feature is not many), the number of features (dimensions) is very large. The higher level is more generalized, the individual feature coverage is large (there are many data with this feature), and the number of features (dimensions) is small. The predicted values of long tail samples are mainly influenced by the characteristics of high level. The predicted values of high frequency samples are mainly influenced by the characteristics of low level.
Characteristics of non-linear models
1) can mainly use the high level characteristics, because the computational complexity is large, so the characteristics of the dimension is inappropriate;
2) It is possible to fit the target well by the high-level nonlinear mapping.
Characteristics of linear models
1) The feature system should be as comprehensive as possible, both high and low level;
2) You can convert the high level to low to enhance the model's fit capability.
Second, the characteristics of normalization
After feature extraction, if different characteristics of the range of values vary greatly, it is best to normalization of features to achieve better results, the common normalization is as follows:
Standardization:
The mean value of the x distribution, which is the standard deviation of the x distribution;
Scaling to Unit length:
Normalized to unit length vector
Three, feature selection
After feature extraction and normalization, if you find that there are too many features that can cause the model to be untrained, or that it is easy to cause the model to cross-fit, you need to select the feature and pick a valuable feature.
Filter:
Assuming that the effect of feature subset on model estimation is independent, select a subset of features, analyze the relationship between the subset and the data label, and if there is a positive correlation, the subset of features is considered to be valid. There are many algorithms for measuring feature subsets and data label relationships, such as Chi-square,information Gain.
Wrapper:
Select a feature subset to join the original feature set, train with the model, compare the effect of the subset before and after, if the effect is better, it is considered that the feature subset is valid, otherwise it is considered invalid.
Embedded:
Combining feature selection and model training, such as adding L1 Norm to the loss function, L2 Norm.
Feature extraction, feature selection