R language: Use the dummyVars function in the caret package to process virtual variables.
DummyVars function: dummyVars creates a full set of dummy variables (I. e. less than full rank parameterization ---- create a complete set of Virtual variables
Here is a simple example:
Survey <-data. frame (service = c ("very unhappy", "unhappy", "neutral", "happy", "very happy "))
Survey
# Service
#1 very unhappy
#2 unhappy
#3 neutral
#4 happy
#5 very happy
# We can directly add a column of rank, which uses numbers to represent different emotions.
Survey <-data. frame (service = c ("very unhappy", "unhappy", "neutral", "happy", "very happy"), rank = c (1, 2, 3, 4, 5 ))
Survey
# Service rank
#1 very unhappy 1
#2 unhappy 2
#3 neutral 3
#4 happy 4
#5 very happy 5
Obviously, it is not difficult to process a single variable as above, but it takes a lot of time to process virtual variables in the face of multiple factor variables.
The dummyVars function in the caret package is used to process dummy variables.
Library (caret)
# Loading required package: lattice
# Loading required package: ggplot2
MERs <-data. frame (id = c (10, 20, 30, 40, 50), gender = c ("male", "female", "female", "male", "female "),
Mood = c ("happy", "sad", "happy", "sad", "happy"), outcome = c (1, 1, 0, 0 ))
MERs
# Id gender mood outcome
#1 10 male happy 1
#2 20 female sad 1
#3 30 female happy 0
#4 40 male sad 0
#5 50 female happy 0
# Use the dummyVars function to process dummy variables of MERs data
Dmy <-dummyVars (~., Data = customers)
# Predict its own variables and convert them to the data. frame format
Trsf <-data. frame (predict (dmy, newdata = MERs mers ))
Trsf
# Id gender. female gender. male mood. happy mood. sad outcome
#1 10 0 1 1 0 1
#2 20 1 0 0 1 1
#3 30 1 0 1 0 0
#4 40 0 1 0 1 0
#5 50 1 0 1 0 0
Outcome does not process dummy variables.
We can view the customers data type.
Str (customers)
# 'Data. framework': 5 obs. of 4 variables:
# $ Id: num 10 20 30 40 50
# $ Gender: Factor w/2 levels "female", "male": 2 1 1 2 1
# $ Mood: Factor w/2 levels "happy", "sad": 1 2 1 2 1
# $ Outcome: num 1 1 0 0 0
It can be seen that the default type of outcome is numeric, which is not what we want now. Next, convert the Variable outcome to the factor type.
Customers $ outcome <-as. factor (customers $ outcome)
Str (customers)
# 'Data. framework': 5 obs. of 4 variables:
# $ Id: num 10 20 30 40 50
# $ Gender: Factor w/2 levels "female", "male": 2 1 1 2 1
# $ Mood: Factor w/2 levels "happy", "sad": 1 2 1 2 1
# $ Outcome: Factor w/2 levels "0", "1": 2 2 1 1
After the outcome type variable in mers Mers is converted, we use dmy to predict the data again and view the final result.
Trsf <-data. frame (predict (dmy, newdata = MERs mers ))
Trsf
# Id gender. female gender. male mood. happy mood. sad outcome0 outcome1
#1 10 0 1 1 0 0 1
#2 20 1 0 0 1 0 1
#3 30 1 0 1 0 1 0
#4 40 0 1 0 1 1 0
#5 50 1 0 1 0 1 0
As you can see, outcome has also processed virtual variables.
Of course, you can also perform virtual variable (dummy variable) Processing on a variable in the data. If you need to process the gender variable in the MERs data as a dummy variable, you can perform the following operations:
Dmy <-dummyVars (~ Gender, data = customers)
Trfs <-data. frame (predict (dmy, newdata = customers ))
Trfs
# Gender. female gender. male
#1 0 1
#2 1 0
#3 1 0
#4 0 1
#5 1 0
For two-classification factor variables, we may not need to have two columns (for example, gender. female and gender. male) representing the same meaning after processing the virtual variables ). In this case, we can use the fullRank parameter in the dummyVars function to set this parameter to TRUE.
Dmy <-dummyVars (~., Data = MERs, fullRank = T)
Trfs <-data. frame (predict (dmy, newdata = customers ))
Trfs
# Id gender. male mood. sad outcome.1
#1 10 1 0 1
#2 20 0 1 1
#3 30 0 0 0
#4 40 1 1 0
#5 50 0 0 0