Description
We've explored several algorithms before, each with its pros and cons, so when we judge a specific problem, we have to evaluate the different predictive models. To simplify this process, we use the caret package to generate and compare different models and performance. Operation
Load the corresponding package and set the training control algorithm to 10 percent cross-validation with a repeat number of 3:
Library (ROCR) library (
e1071) library (
"PROC") library (
caret)
library ("PROC") Control
= Traincontrol (method = "REPAETEDCV", number
= ten,
repeats =3,
classprobs = TRUE,
summaryfunction = Twoclasssummary)
Using GLM to train a classifier on the training data set
Glm.model = Train (churn ~.,
data= trainset, method
= "GLM",
metric = "ROC",
Trcontrol = control)
Training a classifier on training data sets using SVM
Svm.model = Train (churn ~.,
data= trainset, method
= "Svmradial",
metric = "ROC",
Trcontrol = contro L
Use the Rpart function to view the operation of Rpart on a training dataset
Rpart.model = Train (churn ~.,
data = trainset, method
= "Svmradial",
metric = "ROC",
Trcontrol = control)
Use different types of data that are already well trained to predict:
Glm.probs = Predict (Glm.model,testset[,!names (Testset)%in% C ("churn")],type = "prob")
Svm.probs = Predict ( Svm.model,testset[,!names (Testset)%in% C ("churn")],type = "prob")
Rpart.probs = Predict (rpart.model,testset[,! Names (Testset)%in% C ("churn")],type = "prob")
Generate the ROC curves for each model, drawing them in a diagram:
Glm. ROC = ROC (response = Testset[,c ("churn")],
predictor = Glm.probs$yes,
levels = levels (Testset[,c ("churn")])
plot (GLM. Roc,type = "S", col = "Red")
SVM. ROC = ROC (response = Testset[,c ("churn")],
predictor = Svm.probs$yes,
levels = levels (Testset[,c ("churn")])
plot (SVM. Roc,add = True,col = "Green")
Rpart. ROC = ROC (response = Testset[,c ("churn")],
predictor = Rpart.probs$yes,
levels = levels (Testset[,c ("churn")])
plot (Rpart. Roc,add = True,col = "Blue")
ROC curve Description of three classifiers
The ROC curves of different classification models are compared in the same graph, and the control parameters of the training process are 10 percent cross-validation that repeat three times, the evaluation parameters of model performance are twoclasssummary, then the classification model is established using Glm,svm,rpart and three different methods.
It can be seen from the graph that SVM's prediction results (not tuned) for training sets are the best in three classification algorithms.