A survey of common algorithms for feature selection

Last Update:2015-11-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

General process of Feature selection:

1. Generating subsets: Searching for a subset of features, providing a subset of features for the evaluation function

2. Evaluation function: Evaluate the quality of a subset of features

3. Stop criteria: Related to the evaluation function, is generally a threshold value, the evaluation function to reach a certain standard can stop the search

4. Validation process: Verifying the validity of selected feature subsets on the validation data set

1. Generating subsets

Search algorithm has three categories: Full search, heuristic search and random search.

(1) Full search

<1> wide Search (breadth First search): High time complexity, not practical

<2> Branch Boundary Search (Branch and Bound): In fact, wide search plus depth limit

<3> directed Search: In fact, one kind of heuristic, to the wide search plus each expansion of the node limit to save time space, for the expansion of those nodes by the heuristic function determined by the Beam

<4> Best Firstsearch: It also has heuristic function to expand the optimal node of wide search.

(2) Heuristic search

<1> sequence forward selection (SFS, sequential Forward Selection)

Feature subset x starts with an empty set, each time the selection can make the evaluation function J (x) the best of a feature X added, is actually the greedy algorithm, the disadvantage is only add not minus

<2> sequence back selection (SBS, Sequential backward Selection)

In contrast to SFS, starting with a complete set of features, each time you select the feature X that evaluates the function J (x), it is greedy, and the disadvantage is that it does not increase

<3> bi-directional search (BDS, bidirectional search)

SFS and SBS start at the same time, stopping when both search for a subset of the same feature.

<4> add L to R selection algorithm (LRS, plus-l minus-r Selection)

Form one: Starting from the empty set, each time the addition of L features, the removal of R features, making J Optimal

Form two: Starting from the complete set, each time the R features are removed, adding L features to make J Optimal.

<5> sequence Float selection (sequential floating Selection)

The algorithm is developed by L-R, and the difference is that L and R will change, which combines the characteristics of the sequence forward and backward selection, L-r, and compensate for the shortcomings.

① sequence floating forward selection (sffs, sequential floating Forward Selection)

Starting with the empty set, each round of select subset X is added to make J Optimal, and then select Subset Z culling makes J optimal.

② sequence floating back selection (SFBS, sequential floating backward Selection)

As opposed to ①, from the complete set, first remove and then join.

<6> Decision Trees (decision Tree Method, DTM)

The general use of information gain as a function of evaluation, the decision tree after the growth of pruning, the last leaves is a feature subset.

(3) Random algorithm

<1> random generation Sequence selection algorithm (RGSS, random Generation plus sequential Selection)

A subset of features is randomly generated, followed by SFS or SBS, which can be used as a supplement to the SFS and SBS to jump out of the local optimal solution.

<2> simulated annealing algorithm (SA, simulated annealing)

Simulated annealing can avoid partial optimization to some extent, but it may be difficult to solve.

<3> Genetic Algorithm (GA, genetic algorithms)

Firstly, a batch of feature subsets are generated randomly, and the feature subsets are scored by the evaluation function, then the feature subsets of the next generation are bred by cross-mutation and so on, and the higher-scoring feature subset is selected to participate in the breeding probability. After the multiplication and the fittest of the n generation, the characteristic subset with the highest value of the evaluation function may be produced in the population.

2. Evaluation function

It is mainly divided into filter and wrapper to evaluate feature subset.

(1) Filter

In fact, preprocessing, using the characteristics of the training set to filter out the feature subset and then into the classifier to learn, and the selection of the classifier is irrelevant.

(2) Wrapper

The training set is classified by the feature subset selected by the wrapper, and the accuracy of classification is used as a criterion to measure the feature subset.

The common evaluation functions are:

(1) Relevance

Based on the hypothesis that a good subset of features should be characterized by a higher degree of relevance to the classification and a lower correlation between features

Linear correlation coefficient

(2) Distance/similarity

Based on the assumption that a good subset of features should make the sample distances belonging to the same class as small as possible, and the distances between samples belonging to different classes as far as possible

Calculation of common similarity

(3) Information gain

As previously mentioned, the information gain reflects the emergence of feature subsets to increase the amount of system information.

(4) Consistency

If the sample 1 and sample 2 belong to a different classification, but on the characteristics A, B is exactly the same value, then the feature subset {A,B} should not be selected as the final feature set.

(5) Classifier error rate

The accuracy of classification is used as the criterion.

For text feature extraction, most of them use the filter method to extract feature, whether it is based on the word frequency of the VSM or semantic-based methods, mostly by calculating the formula for each feature item is scored,

Finally, the highest-scoring K term is selected to form a subset of features. Wrapper method used very little, I think it is a large number of text features, for the above-mentioned continuous generation of feature subsets to evaluate is too slow, wrapper method

is also because of inefficient and abandoned.

Note: A review of the general algorithms of the above features selection excerpt from blog http://www.cnblogs.com/heaad/archive/2011/01/02/1924088.html

A survey of common algorithms for feature selection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A survey of common algorithms for feature selection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A survey of common algorithms for feature selection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support