Both feature extraction and feature selection are the most effective features (immutability of similar samples, identification of different samples, and robustness to noise) from the original features.
Feature Extraction: it has obvious physical significance to convert original features into a group (Gabor, geometric features [corner points, immutations], texture [HSV hog]). or statistical significance or core feature selection: select a group of the most statistically significant features from the feature set to reduce the dimension: 1. reduce data storage and input data bandwidth 2. Reduce redundancy 3. Improve classification at low latitude 4. Discover more meaningful potential variables to help you gain a deeper understanding of the data
Linear Feature Extraction PCA-Principal Component Analysis: Find the optimal subspaces that represent data distribution (dimensionality reduction, can be irrelevant) in fact, it is the feature vector corresponding to the first s largest feature values of the covariance matrix. The ing matrix is described in a very intuitive and detailed article.
Principal Analysis (pcaw.theoretical Analysis and Application .doc
561.5 KB
Lda-idea of Linear Discriminant Analysis: Find the largest sub-space of the discriminant criterion. The Fisher concept is used to find a vector to minimize intra-class divergence and maximum inter-class divergence after dimensionality reduction; in fact, it is the first s feature vector of Sw-1Sb corresponding to the feature vector structure ing matrix DHS pattern classification book 96 pages have detailed derivation, easy to understand Reference Paper 1
ICA-Independent Component Analysis: PCA reduces the dimensionality of raw data and extracts irrelevant parts. ICA reduces the dimensionality of raw data and extracts independent attributes; find a linear transformation z = wx to maximize the independence of each component of Z, I (z) = ELN (P (Z)/P (Z1 ).. (P (ZD) view the derivation and calculation of machine learning a probabilistic perspective. (2)
Note: The problem of PCA & icapca is actually a base transformation, which makes the transformed data have the largest variance. The variance size describes the amount of information about a variable. When we talk about the stability of a thing, we often say that we need to reduce the variance. If a model has a large variance, the model is unstable. However, for the data we use for Machine Learning (mainly training data), the variance is significant. Otherwise, if the input data is the same vertex, the variance is 0, in this way, multiple input data is equivalent to one data.
ICA is used to find the mutually independent parts of the signal (without orthogonal), corresponding to the higher-order statistical analysis. ICA theory holds that the X of the hybrid data array used for observation is obtained by the linear weighting of the independent element S through. The objective of ICA theory is to obtain a separation matrix W through X, so that the signal y obtained by W acting on X is the optimal approximation of the independent source S. The relationship can be expressed in the following formula:
Y = wx = was, a = inv (W)
Compared with PCA, ICA can better characterize the random Statistical Characteristics of variables and suppress Gaussian noise.
Two-dimensional PCA Reference Paper 3
Typical corresponding analysis ideas of CCA-canonical correlaton analysis: find two groups of bases, this maximizes the projection correlation between the two sets of data on the basis of the two sets of data. It is used to describe the linear relationship between two high-dimensional variables. We can use PLS (partial least squares) to solve this problem.
Nonlinear Feature Extraction Kernel PCA Reference Paper 5
Kernel FDA references 6
Manifold Learning finding popular low-dimensional coordinates using the local structure of popular learning for dimensionality reduction methods: Isomap, lle, Laplacian eigenmap, LPP References 10
Summary of principles
Divided into three types: 1. Euclidean distance-based criterion (divergence matrix) 2. Probability distance-based criterion
3. entropy-based principle
Corresponding principles
References [1] Hua Yu and Jieyang, a direct LDA Algorithm for High-dimen1_data with Application to face recognition, pattern recognition volume 34, Issue 10, October 2001, pp.2067-2070 [2]. hyvarinenand E. oja. independent Component Analysis: algorithms and applications. neural Networks, 13 (4-5): 411-430,200 [3] J. yang, D. zhang, a.f. frangi, and j.y. yang, two-dimen1_pca: A New Approach to appearance-Based Face representation and recognition, IEEE Trans. on Pattern Analysis and machine intelligence, vol. 26, No. 1, pp. 131-137, Jan. 2004 [4] r. h. david, S. sandor and S. -T. john, canonical correlation analysis: An overview with Application to learning methods, technical report, CSD-tr-03-02,2003 [5] B. scholkopf,. smola, and K. r. muller. nonlinear component analysis as a kernel eigenvalue problem, neural computation, 10 (5): 1299-1319,199 8 [6] Mika, S ., ratsch, G ., weston, J ., scholkopf, B ., mullers, K. R ., fisher discriminantanalysis with kernels, neural networks for Signal Processing IX, Proceedings of the IEEE signal processing society workshop, pp. 41-48,199 9 [7] J. b. tenenbaum, V. de Silva, and J. c. langford, a global geometric framework for Nonlinear Dimensionality regression ction, science, 290, pp. 2319-2323,200 0 [8] Sam T. roweis, and Lawrence K. saul, Nonlinear Dimensionality allocation ction by Locally Linear embedding, science 22 December 2000 [9] Mikhail Belkin, Partha niyogi, Laplacian eigenmaps for dimensionality allocation ction and data representation, computation, 200 [10] Xiaofei he, Partha niyogi, locality preserving projections, advances in neural information processing systems 16 (NIPS 2003), Vancouver, Canada, 2003
Feature Extraction and Feature Selection