The steps that is taken to solve a feature selection problem: The steps for Feature selection

Source: Internet
Author: User

Reference: JMLR paper "An Introduction to Variable and feature selection"


We summarize the steps, May is taken to solve a feature selection problem in a check list:


1. Do I have domain knowledge? If Yes, construct a better set of "ad hoc" features.


2. Is your features commensurate (can be measured in the same unit)? If No, consider normalizing them.


3. Do you suspect interdependence of features? If Yes, expand your feature set by constructing conjunctive features or the products of the features (by building federated features < should be multiple variable S as a feature> or high-level feature, expand your feature set), as much as your computer resources allow you (see Example of Use in Section 4.4).


4. Need to prune (crop) the input variables (e.g. for cost, speed or data understanding reasons)? If No, construct disjunctive features or weighted sums of features (build disjunction feature < should be a variables as a feature> or weighted and characteristic) (e.g. b Y clustering or matrix factorization, see section 5).


5. Do your need to assess features individually (individually evaluates each feature) (e.g. to understand their in?uence on the system or because Their number is so large the need to do a? rst? ltering)? If Yes, use a variable ranking method (sections 2 and section 7.2); else, do it anyway to get baseline results.


6. Do you need a predictor? If No, stop.


7. Suspect your data is "dirty" (have a few meaningless input patterns and/or noisy outputs or wrong class labels)? If Yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; Check and/or discard them (note: The them here is example meaning, not feature ... ).


8. Do you know what to try? rst? If No, use a linear predictor. use a forward selection method (sections 4.2) with the "probe" method as a stopping crit Erion (section 6) or use the L 0 -norm  Embedded method (Section 4.3). For comparison, following the ranking of step 5, Construct a sequence of predictors of same nature using increasing s Ubsets of features. Can you match or improve performance with a smaller subset? If Yes, try a non-linear predictor with That subset.


9. Do I have new ideas, time, computational resources, and enough examples? If Yes, compare several feature selection methods, including your new idea, correlation coef?cients, backward selection an D Embedded Methods (section 4). Use linear and non-linear predictors. Select the best approach with model selection (Section 6).


Want a stable solution (to improve performance and/or understanding)? If Yes, sub-sample your data and redo your analysis for several ' bootstraps ' (Section 7.1)




Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

The steps that is taken to solve a feature selection problem: The steps for Feature selection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.