R language: Virtual variable processing using the Dummyvars function in the caret package

Source: Internet
Author: User

Dummyvars function: Dummyvars creates a full set of the dummy variables (i.e. less than-rank parameterization----Set up a complete set of virtual variables

Let's give you a simple example:

Survey<-data.frame (Service=c ("very unhappy","Unhappy","Neutral","Happy","very Happy")) Survey## service## 1 Very unhappy## 2 Unhappy## 3 Neutral## 4 Happy## 5 Very happy


# we can directly add a rank, and use numbers to represent different emotions

Survey<-data.frame (Service=c ("very unhappy","Unhappy","Neutral","Happy","very Happy"), Rank=c (1,2,3,4,5)) Survey## service Rank## 1 Very unhappy 1## 2 Unhappy 2## 3 Neutral 3## 4 Happy 4## 5 Very happy 5


Obviously, it is not difficult to do this with a single variable, but it can take a lot of time to deal with a virtual variable when confronted with multiple factor variables.


The Dummyvars function in the caret package is used to handle the dummy variables of the factor variables.

Library (caret)## Loading Required Package:lattice## Loading Required Package:ggplot2Customers<-data.frame (Id=c (10,20,30,40,50), Gender=c ("male","female","female","male","female"), Mood=c ("Happy","Sad","Happy","Sad","Happy"), Outcome=c (, 0,0,0)) Customers## ID Gender mood outcome## 1 Male Happy 1## 2 Female sad 1## 3 Female Happy 0## 4 Male sad 0## 5 Female Happy 0

# using the Dummyvars function to manipulate the customers data with dummy variables

Dmy<-dummyvars (~.,data=customers)


# predict your own variables and convert them into data.frame format

Trsf<-data.frame (Predict (dmy,newdata=customers)) TRSF##   ID gender.female gender.male Mood.happy Mood.sad outcome## 1             0           1          1        0       1## 2 20             1           0          0        1       1## 3             1           0          1        0       0#  # 4             0           1          0        1       0## 5 1 0 1 0       0

From the results, outcome does not handle dummy variables.


We view customers data types

Str (Customers) # # ' Data.frame ':    5 obs. of  4 variables:##  $ id     : num  ten,#< /c10>#  $ gender:factor W/2 Levels "female", "Male": 2 1 1 2 1##  $ mood   : Facto R W/2 Levels "Happy", "sad": 1 2 1 2 1##  $ outcome:num  1 1 0 0 0


As can be seen, the default type of outcome is numeric, which is not what we want now. Next, convert the variable outcome to the factor type.

customers$outcome<-as.factor (customers$outcome) str (customers)## ' data.frame ':    5 Obs. of  4 variables:##  $ id     : num  ten,##  $ gender : Factor W/2 Levels "female", "Male": 2 1 1 2 1##  $ mood   : Factor w/2 Levels "Happy", "sad ": 1 2 1 2 1##  $ outcome:factor W/2 levels" 0 "," 1 ": 2 2 1 1 1


After the variable outcome type conversion in customers, we use dmy again to predict the data and see the final result.

 Trsf<-data.frame (Predict (Dmy,newdata=customers)) TRSF  #  # ID gender.female gender.male, mood.happy mood.sad OUTCOME0  Span style= "color: #008000;" >#  # 1 0 1 1 0 0 1  #  # 2 1 0 0 1 0 1  #  # 3 1 0 1 0 1 0  #  # 4 0 1 0 1 1 0  #  # 5 1 0 1 0 1 0  


As can be seen, outcome has also been processed by virtual variables.


Of course, virtual variables (dummy variables) can also be processed for one of the variables in the data. If we need to manipulate the variable gender in the customers data for dummy variables, you can do the following:

Dmy<-dummyvars (~gender,data=customers) Trfs<-data.frame (Predict (dmy,newdata=customers)) Trfs  # # # # # # #   gender.female gender.male##1             0           1## 2             1           0## 3             1           0## 4             0           1## 5             1           0

For the factor variables of the two classifications, we may not need to appear two columns that represent the same meaning after the virtual variable is processed (for example: Gender.female and Gender.male). At this point we can use the Fullrank parameter in the Dummyvars function to set this parameter to True.

Dmy<-dummyvars (~.,data=customers,fullrank=T) trfs<-data.frame (Predict (dmy,newdata=customers) ) Trfs##   ID gender.male mood.sad outcome.1## 1           1        0         1 # # 2           0        1         1## 3 of           0 0         0## 4 40           1        1         0## 5           0          0 0

R language: Virtual variable processing using the Dummyvars function in the caret package

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.