Dummyvars function: Dummyvars creates a full set of the dummy variables (i.e. less than-rank parameterization----Set up a complete set of virtual variables
Let's give you a simple example:
Survey<-data.frame (Service=c ("very unhappy","Unhappy","Neutral","Happy","very Happy")) Survey## service## 1 Very unhappy## 2 Unhappy## 3 Neutral## 4 Happy## 5 Very happy
# we can directly add a rank, and use numbers to represent different emotions
Survey<-data.frame (Service=c ("very unhappy","Unhappy","Neutral","Happy","very Happy"), Rank=c (1,2,3,4,5)) Survey## service Rank## 1 Very unhappy 1## 2 Unhappy 2## 3 Neutral 3## 4 Happy 4## 5 Very happy 5
Obviously, it is not difficult to do this with a single variable, but it can take a lot of time to deal with a virtual variable when confronted with multiple factor variables.
The Dummyvars function in the caret package is used to handle the dummy variables of the factor variables.
Library (caret)## Loading Required Package:lattice## Loading Required Package:ggplot2Customers<-data.frame (Id=c (10,20,30,40,50), Gender=c ("male","female","female","male","female"), Mood=c ("Happy","Sad","Happy","Sad","Happy"), Outcome=c (, 0,0,0)) Customers## ID Gender mood outcome## 1 Male Happy 1## 2 Female sad 1## 3 Female Happy 0## 4 Male sad 0## 5 Female Happy 0
# using the Dummyvars function to manipulate the customers data with dummy variables
Dmy<-dummyvars (~.,data=customers)
# predict your own variables and convert them into data.frame format
Trsf<-data.frame (Predict (dmy,newdata=customers)) TRSF## ID gender.female gender.male Mood.happy Mood.sad outcome## 1 0 1 1 0 1## 2 20 1 0 0 1 1## 3 1 0 1 0 0# # 4 0 1 0 1 0## 5 1 0 1 0 0
From the results, outcome does not handle dummy variables.
We view customers data types
Str (Customers) # # ' Data.frame ': 5 obs. of 4 variables:## $ id : num ten,#< /c10># $ gender:factor W/2 Levels "female", "Male": 2 1 1 2 1## $ mood : Facto R W/2 Levels "Happy", "sad": 1 2 1 2 1## $ outcome:num 1 1 0 0 0
As can be seen, the default type of outcome is numeric, which is not what we want now. Next, convert the variable outcome to the factor type.
customers$outcome<-as.factor (customers$outcome) str (customers)## ' data.frame ': 5 Obs. of 4 variables:## $ id : num ten,## $ gender : Factor W/2 Levels "female", "Male": 2 1 1 2 1## $ mood : Factor w/2 Levels "Happy", "sad ": 1 2 1 2 1## $ outcome:factor W/2 levels" 0 "," 1 ": 2 2 1 1 1
After the variable outcome type conversion in customers, we use dmy again to predict the data and see the final result.
Trsf<-data.frame (Predict (Dmy,newdata=customers)) TRSF # # ID gender.female gender.male, mood.happy mood.sad OUTCOME0 Span style= "color: #008000;" ># # 1 0 1 1 0 0 1 # # 2 1 0 0 1 0 1 # # 3 1 0 1 0 1 0 # # 4 0 1 0 1 1 0 # # 5 1 0 1 0 1 0
As can be seen, outcome has also been processed by virtual variables.
Of course, virtual variables (dummy variables) can also be processed for one of the variables in the data. If we need to manipulate the variable gender in the customers data for dummy variables, you can do the following:
Dmy<-dummyvars (~gender,data=customers) Trfs<-data.frame (Predict (dmy,newdata=customers)) Trfs # # # # # # # gender.female gender.male##1 0 1## 2 1 0## 3 1 0## 4 0 1## 5 1 0
For the factor variables of the two classifications, we may not need to appear two columns that represent the same meaning after the virtual variable is processed (for example: Gender.female and Gender.male). At this point we can use the Fullrank parameter in the Dummyvars function to set this parameter to True.
Dmy<-dummyvars (~.,data=customers,fullrank=T) trfs<-data.frame (Predict (dmy,newdata=customers) ) Trfs## ID gender.male mood.sad outcome.1## 1 1 0 1 # # 2 0 1 1## 3 of 0 0 0## 4 40 1 1 0## 5 0 0 0
R language: Virtual variable processing using the Dummyvars function in the caret package