The following will show you how to use the function ctree in the party package to build a decision tree on the iris dataset.
Sepal.length, Sepal.width, Petal.length, and Petal.width in the iris dataset will be used to predict the type of iris.
The function ctree in the party package is used to build the decision tree, and the function predict is used to predict the new data.
Before modeling, the iris dataset is divided into two subsets: 70% of the data is used for training, the remaining 30% is tested, and in order to achieve reproducible results, the random seed is set to a fixed value.
STR (IRIS)
Set.seed (1234)
IND <-Sample (2, Nrow (Iris), Replace=true, Prob=c (0.7, 0.3))
Traindata <-Iris[ind==1,]
TestData <-iris[ind==2,]
Next, load the party package, build a decision tree, and view the forecast results.
A few of the parameters that function Ctree use to control decision tree training are minsplit, Minbusket, Maxsurrogate, and maxdepth.
Decision tree, using the default settings of these parameters to build a decision tree.
Code, myformula Specifies that species is the target variable, and all remaining variables are arguments.
Library (Party)
Myformula <-Species ~ sepal.length + sepal.width + petal.length + petal.width
Iris_ctree <-ctree (Myformula, Data=traindata)
# Check the prediction
Table (Predict (Iris_ctree), traindata$species)
Then export the rules and draw the decision tree that has been built, and view
Print (Iris_ctree)
Plot (Iris_ctree)
Decision Tree Chart
A bar chart of each leaf node shows the probability that an instance is divided into a certain kind.
650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M02/A5/0E/wKioL1m3PCajNRzvAADIqaUslpQ331.jpg-wh_500x0-wm_ 3-wmp_4-s_2451926648.jpg "title=" Untitled 1.jpg "alt=" Wkiol1m3pcajnrzvaadiqauslpq331.jpg-wh_50 "/>
Simplifying Decision Trees
Plot (Iris_ctree, type= "simple")
The figure shows "Y" in the leaf node.
650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/06/5D/wKiom1m3PFvAvB3IAABqE0S0174881.jpg-wh_500x0-wm_ 3-wmp_4-s_927237793.jpg "title=" Untitled 2.jpg "alt=" Wkiom1m3pfvavb3iaabqe0s0174881.jpg-wh_50 "/>
For example, the tag "n=40,y= (1,0,0)" In Node 2 indicates that the node contains 40 training instances, and all instances belong to the category "Setosa".
Test decision Tree
Test the built decision tree using the test data.
# Predict on test data
Testpred <-Predict (iris_ctree, NewData = testData)
Table (testpred, testdata$species)
Problems in decision Tree algorithm
Ctree current versions do not handle missing values very well, so instances with missing values are sometimes divided into left subtrees, sometimes divided into right subtrees, which are determined by the substitution rules.
There is also a problem, if a variable in the training set is rejected after using the function Ctree to build the decision tree, the variable must also be included in the prediction of the test set, otherwise the calling function predict will fail.
In addition, if the test set differs from the class variable horizontal value of the training set, the predictions for the test set will fail.
Workaround
The way to solve this problem is to use the training set to build a decision tree, then use all the variables contained in the first decision tree to re-invoke Ctree to build a new decision tree, and set the training data explicitly based on the horizontal values of the categorical variables in the test set.
This article from the "CAS Computer Training" blog, declined to reprint!
R language topic, how to use party package to build decision tree?