Study of R language caret pack (ii)--Feature selection

Source: Internet
Author: User
Tags random seed

In the case of data mining, we do not need to model all the arguments, but choose some of the most important variables, called Feature selection (Feature selection). This paper mainly introduces the feature selection of RFE () function based on caret package.

An algorithm is a back-selection, that is, all the variables are included in the model, and then calculate its effectiveness (such as error, prediction accuracy) and variable important ordering, and then retain the most important variables, and then calculate the effectiveness again, so iterative to find the appropriate number of independent variables. One disadvantage of this algorithm is that it is possible to have overfitting, so it is necessary to set up a sample partition loop outside of this algorithm. The RFE command in the caret package can accomplish this task.

RFE (x, y, sizes = 2^ (2:4), metric = IfElse (Is.factor (y),  "accuracy", "RMSE"), maximize = ifelse (metric = = "RMSE", Fals E, TRUE),  Rfecontrol = Rfecontrol (), ...)
    • X training set self-variable matrix or database, note that column names must be unique
    • Result vectors for the Y-training set (numeric or factor)
    • Sizes a numerical vector that corresponds to the number of features that should be retained
    • METRIC specifies what summary measures will be used to select the optimal model. By default, "RMSE" and "rsquared" for regression and "accuracy" and "Kappa" for classification
    • Maximize logical value, whether metric is maximized
    • Rfecontrol a list of control options, including functions for fitting predictions. Some of the predefined functions of the model are as follows: Linear regression (in the object Lmfuncs), random forests (Rffuncs), naive Bayes (Nbfuncs), Bagged trees (tr EEBAGFUNCS) and functions that can is used with caret ' s train function (Caretfuncs). The latter is useful if the model has adjustment parameters that must be determined in each iteration.
Rfecontrol (functions = NULL, Rerank = False, method = "Boot",  savedetails = false, Number = IfElse (method%in% C ("CV", "REPEATEDCV"),  repeats = IfElse (method%in% C ("CV", "REPEATEDCV"), 1, number),  verbose = FALSE, returnre Samp = "final", p = 0.75, index = null,  indexout = null, Timingsamps = 0, seeds = NA, allowparallel = TRUE)

Functions

method determines what sampling methods to use, the default elevation boot, and CV (cross-validation), LOOCV (Leave a cross-validation)

Number folds or resampling iteration count

Seeds random seed set at each iteration of a resampling

There is an * number after the 17th variable, which indicates the highest prediction accuracy when selecting 17 variables

plot (lmprofile) can observe the same result, as follows:

Returns the last reserved argument:

Reference:

Http://topepo.github.io/caret/recursive-feature-elimination.html

http://blog.csdn.net/jiabiao1602/article/details/44975741

Study of R language caret pack (ii)--Feature selection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.