Python Scikit-learn Machine Learning Toolkit Learning Note: feature_selection module

Source: Internet
Author: User

The role of the Sklearn.feature_selection module is feature selection, not feature extraction. univariate Feature Selection: Feature Selection for single variablesThe principle of single-variable feature selection is to calculate a statistic index of each variable separately, and to determine which indexes are important according to this index. Eliminate those unimportant indicators. The Sklearn.feature_selection module has the following methods: Selectkbest and Selectpercentile are similar, the former selects the first n variables, the latter selects the variables ranked in the first n%. And what indicators do they use to rank variables? This requires a designation outside of two. For regression problems, you can use the f_regression indicator. For classification problems, you can use CHI2 or f_classif variables. Examples of use: from sklearn.feature_selection import selectpercentile, F_classifselector = Selectpercentile (F_classif, PERCENTILE=10) There are several other methods that appear to be using other statistical indicators to select variables: Using common univariate statistical tests for each feature:false positive rat e SELECTFPR, false discovery rate selectfdr, or family wise error selectfwe. The document says that if you use a sparse matrix, only the CHI2 indicator is available, and everything else must be transformed into the dense matrix. But I actually found that f_classif can also be used in sparse matrices. Recursive Feature elimination: Looping feature selectionInstead of examining the value of a variable individually, it aggregates it together for testing. The basic idea is that for a set of feature with a quantity of D, all of his subsets are 2 D-minus 1 (including empty sets). Specify an external learning algorithm, such as SVM. This algorithm calculates the validation error for all subsets. Select the subset with the smallest error as the selected feature. This algorithm is quite violent. Implemented by the following two methods: Sklearn.feature_selection. Rfe,sklearn.feature_selection. Rfecv l1-based Feature Selection:The principle of this idea is: in linear regression model, sometimes get sparse solution. It means that the coefficients in front of many variables are equal to 0 or close to 0. This means that these variables are not important, so these variables can be removed. tree-based Feature Selection: Decision tree Feature SelectionMaking feature selection based on decision tree algorithm

Python Scikit-learn Machine Learning Toolkit Learning Note: feature_selection module

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.