R language: Virtual variable processing using the Dummyvars function in the caret package

Source: Internet
Author: User

the Dummyvars function: Dummyvars creates a full set of dummy variables (i.e. than. parameterization----Set up a complete set of virtual variables /c0>

Let's give you a simple example:
Survey<-data.frame (Service=c ("very unhappy", "unhappy", "neutral", "happy", "very happy"))
Survey
# # Service
# # 1 Very unhappy
# 2 Unhappy
# 3 Neutral
# 4 Happy
# # 5 Very happy
# we can directly add a rank, and use numbers to represent different emotions
Survey<-data.frame (Service=c ("very unhappy", "unhappy", "neutral", "happy", "very happy"), Rank=c (1,2,3,4,5))
Survey
# # Service Rank
# # 1 Very unhappy 1
# # 2 Unhappy 2
# # 3 Neutral 3
# # 4 Happy 4
# # 5 Very happy 5
Obviously, it is not difficult to do this with a single variable, but it can take a lot of time to deal with a virtual variable when confronted with multiple factor variables.


The Dummyvars function in the caret package is used to handle the dummy variables of the factor variables.


Library (caret)
# # Loading Required Package:lattice
# # Loading Required Package:ggplot2
Customers<-data.frame (Id=c (10,20,30,40,50), Gender=c ("Male", "female", "female", "male", "female"),
Mood=c ("Happy", "sad", "Happy", "sad", "Happy"), Outcome=c (1,1,0,0,0))
Customers
# # ID Gender Mood outcome
# 1 Male Happy 1
# 2 Female Sad 1
# 3 Female Happy 0
# 4 Male Sad 0
# 5 Female Happy 0
# using the Dummyvars function to manipulate the customers data with dummy variables
Dmy<-dummyvars (~.,data=customers)
# predict your own variables and convert them into data.frame format
Trsf<-data.frame (Predict (dmy,newdata=customers))
Trsf
# # ID Gender.female gender.male mood.happy mood.sad outcome
# 1 10 0 1 1 0 1
# 2 20 1 0 0 1 1
# 3 30 1 0 1 0 0
# 4 40 0 1 0 1 0
# 5 50 1 0 1 0 0
From the results, outcome does not handle dummy variables.


We view customers data types


STR (Customers)
# # ' Data.frame ': 5 obs. of 4 variables:
# # $ id:num 10 20 30 40 50
# # $ gender:factor W/2 Levels "female", "Male": 2 1 1 2 1
# # $ mood:factor W/2 Levels "Happy", "sad": 1 2 1 2 1
# # $ outcome:num 1 1 0 0 0
As can be seen, the default type of outcome is numeric, which is not what we want now. Next, convert the variable outcome to the factor type.


Customers$outcome<-as.factor (Customers$outcome)
STR (Customers)
# # ' Data.frame ': 5 obs. of 4 variables:
# # $ id:num 10 20 30 40 50
# # $ gender:factor W/2 Levels "female", "Male": 2 1 1 2 1
# # $ mood:factor W/2 Levels "Happy", "sad": 1 2 1 2 1
# # $ outcome:factor W/2 levels "0", "1": 2 2 1 1 1
After the variable outcome type conversion in customers, we use dmy again to predict the data and see the final result.


Trsf<-data.frame (Predict (dmy,newdata=customers))
Trsf
# # ID Gender.female gender.male mood.happy mood.sad OUTCOME0 outcome1
# 1 10 0 1 1 0 0 1
# 2 20 1 0 0 1 0 1
# 3 30 1 0 1 0 1 0
# 4 40 0 1 0 1 1 0
# 5 50 1 0 1 0 1 0
As can be seen, outcome has also been processed by virtual variables.


Of course, virtual variables (dummy variables) can also be processed for one of the variables in the data. If we need to manipulate the variable gender in the customers data for dummy variables, you can do the following:


Dmy<-dummyvars (~gender,data=customers)
Trfs<-data.frame (Predict (dmy,newdata=customers))
Trfs
# # Gender.female Gender.male
# 1 0 1
# 2 1 0
# 3 1 0
# 4 0 1
# 5 1 0
For the factor variables of the two classifications, we may not need to appear two columns that represent the same meaning after the virtual variable is processed (for example: Gender.female and Gender.male). At this point we can use the Fullrank parameter in the Dummyvars function to set this parameter to True.


Dmy<-dummyvars (~.,data=customers,fullrank=t)
Trfs<-data.frame (Predict (dmy,newdata=customers))
Trfs
# # ID Gender.male mood.sad outcome.1
# 1 10 1 0 1
# 2 20 0 1 1
# 3 30 0 0 0
# 4 40 1 1 0
# 5 50 0 0 0

R language: Virtual variable processing using the Dummyvars function in the caret package

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.