Reference: http://scikit-learn.org/stable/modules/unsupervised_reduction.html
For high-dimensional features, it is often necessary to unsupervised dimensionality reduction before supervised.
The following sections of the translation will be appended later.
4.4.1. Pca:principal Component Analysis
decomposition. PCA looks for a combination of features, that capture well the variance of the original features. See decomposing signals in components (matrix factorization problems). Translation article reference: http://blog.csdn.net/mmc2015/article/details/46867597.
Examples
- Faces Recognition example using Eigenfaces and SVMs
4.4.2. Random projections
The module: random_projection provides several tools fordata reduction by random projections. See the relevant section of the Documentation:random Projection.
Examples
- The Johnson-lindenstrauss bound for embedding with random projections
4.4.3. Feature agglomeration (feature aggregation)
cluster. Featureagglomeration applies hierarchical clustering to group together features that behave similarly.
Examples
- Feature agglomeration vs. Univariate selection
- Feature agglomeration
Feature Scaling
Note that if features has very different scaling or statistical properties, cluster. Featureagglomeration May is able to capture the links between related features. Using a preprocessing. Standardscaler can useful in these settings.
Pipelining:the unsupervised data reduction and the supervised estimator can be chained in one step. See Pipeline:chaining estimators.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
scikit-learn:4.4. Unsupervised dimensionality reduction (dimensionality reduction)