R language: Use the dummyVars function in the caret package to process virtual variables.

Source: Internet
Author: User

R language: Use the dummyVars function in the caret package to process virtual variables.
DummyVars function: dummyVars creates a full set of dummy variables (I. e. less than full rank parameterization ---- create a complete set of Virtual variables

Here is a simple example:
Survey <-data. frame (service = c ("very unhappy", "unhappy", "neutral", "happy", "very happy "))
Survey
# Service
#1 very unhappy
#2 unhappy
#3 neutral
#4 happy
#5 very happy
# We can directly add a column of rank, which uses numbers to represent different emotions.
Survey <-data. frame (service = c ("very unhappy", "unhappy", "neutral", "happy", "very happy"), rank = c (1, 2, 3, 4, 5 ))
Survey
# Service rank
#1 very unhappy 1
#2 unhappy 2
#3 neutral 3
#4 happy 4
#5 very happy 5
Obviously, it is not difficult to process a single variable as above, but it takes a lot of time to process virtual variables in the face of multiple factor variables.


The dummyVars function in the caret package is used to process dummy variables.


Library (caret)
# Loading required package: lattice
# Loading required package: ggplot2
MERs <-data. frame (id = c (10, 20, 30, 40, 50), gender = c ("male", "female", "female", "male", "female "),
Mood = c ("happy", "sad", "happy", "sad", "happy"), outcome = c (1, 1, 0, 0 ))
MERs
# Id gender mood outcome
#1 10 male happy 1
#2 20 female sad 1
#3 30 female happy 0
#4 40 male sad 0
#5 50 female happy 0
# Use the dummyVars function to process dummy variables of MERs data
Dmy <-dummyVars (~., Data = customers)
# Predict its own variables and convert them to the data. frame format
Trsf <-data. frame (predict (dmy, newdata = MERs mers ))
Trsf
# Id gender. female gender. male mood. happy mood. sad outcome
#1 10 0 1 1 0 1
#2 20 1 0 0 1 1
#3 30 1 0 1 0 0
#4 40 0 1 0 1 0
#5 50 1 0 1 0 0
Outcome does not process dummy variables.


We can view the customers data type.


Str (customers)
# 'Data. framework': 5 obs. of 4 variables:
# $ Id: num 10 20 30 40 50
# $ Gender: Factor w/2 levels "female", "male": 2 1 1 2 1
# $ Mood: Factor w/2 levels "happy", "sad": 1 2 1 2 1
# $ Outcome: num 1 1 0 0 0
It can be seen that the default type of outcome is numeric, which is not what we want now. Next, convert the Variable outcome to the factor type.


Customers $ outcome <-as. factor (customers $ outcome)
Str (customers)
# 'Data. framework': 5 obs. of 4 variables:
# $ Id: num 10 20 30 40 50
# $ Gender: Factor w/2 levels "female", "male": 2 1 1 2 1
# $ Mood: Factor w/2 levels "happy", "sad": 1 2 1 2 1
# $ Outcome: Factor w/2 levels "0", "1": 2 2 1 1
After the outcome type variable in mers Mers is converted, we use dmy to predict the data again and view the final result.


Trsf <-data. frame (predict (dmy, newdata = MERs mers ))
Trsf
# Id gender. female gender. male mood. happy mood. sad outcome0 outcome1
#1 10 0 1 1 0 0 1
#2 20 1 0 0 1 0 1
#3 30 1 0 1 0 1 0
#4 40 0 1 0 1 1 0
#5 50 1 0 1 0 1 0
As you can see, outcome has also processed virtual variables.


Of course, you can also perform virtual variable (dummy variable) Processing on a variable in the data. If you need to process the gender variable in the MERs data as a dummy variable, you can perform the following operations:


Dmy <-dummyVars (~ Gender, data = customers)
Trfs <-data. frame (predict (dmy, newdata = customers ))
Trfs
# Gender. female gender. male
#1 0 1
#2 1 0
#3 1 0
#4 0 1
#5 1 0
For two-classification factor variables, we may not need to have two columns (for example, gender. female and gender. male) representing the same meaning after processing the virtual variables ). In this case, we can use the fullRank parameter in the dummyVars function to set this parameter to TRUE.


Dmy <-dummyVars (~., Data = MERs, fullRank = T)
Trfs <-data. frame (predict (dmy, newdata = customers ))
Trfs
# Id gender. male mood. sad outcome.1
#1 10 1 0 1
#2 20 0 1 1
#3 30 0 0 0
#4 40 1 1 0
#5 50 0 0 0

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.