R language topic, how to use party package to build decision tree?

Source: Internet
Author: User
Tags random seed

The following will show you how to use the function ctree in the party package to build a decision tree on the iris dataset.


Sepal.length, Sepal.width, Petal.length, and Petal.width in the iris dataset will be used to predict the type of iris.

The function ctree in the party package is used to build the decision tree, and the function predict is used to predict the new data.


Before modeling, the iris dataset is divided into two subsets: 70% of the data is used for training, the remaining 30% is tested, and in order to achieve reproducible results, the random seed is set to a fixed value.


STR (IRIS)

Set.seed (1234)

IND <-Sample (2, Nrow (Iris), Replace=true, Prob=c (0.7, 0.3))

Traindata <-Iris[ind==1,]

TestData <-iris[ind==2,]

Next, load the party package, build a decision tree, and view the forecast results.

A few of the parameters that function Ctree use to control decision tree training are minsplit, Minbusket, Maxsurrogate, and maxdepth.


Decision tree, using the default settings of these parameters to build a decision tree.


Code, myformula Specifies that species is the target variable, and all remaining variables are arguments.

Library (Party)

Myformula <-Species ~ sepal.length + sepal.width + petal.length + petal.width

Iris_ctree <-ctree (Myformula, Data=traindata)

# Check the prediction

Table (Predict (Iris_ctree), traindata$species)

Then export the rules and draw the decision tree that has been built, and view

Print (Iris_ctree)

Plot (Iris_ctree)


Decision Tree Chart


A bar chart of each leaf node shows the probability that an instance is divided into a certain kind.

650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M02/A5/0E/wKioL1m3PCajNRzvAADIqaUslpQ331.jpg-wh_500x0-wm_ 3-wmp_4-s_2451926648.jpg "title=" Untitled 1.jpg "alt=" Wkiol1m3pcajnrzvaadiqauslpq331.jpg-wh_50 "/>


Simplifying Decision Trees


Plot (Iris_ctree, type= "simple")

The figure shows "Y" in the leaf node.

650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/06/5D/wKiom1m3PFvAvB3IAABqE0S0174881.jpg-wh_500x0-wm_ 3-wmp_4-s_927237793.jpg "title=" Untitled 2.jpg "alt=" Wkiom1m3pfvavb3iaabqe0s0174881.jpg-wh_50 "/>

For example, the tag "n=40,y= (1,0,0)" In Node 2 indicates that the node contains 40 training instances, and all instances belong to the category "Setosa".



Test decision Tree


Test the built decision tree using the test data.

# Predict on test data

Testpred <-Predict (iris_ctree, NewData = testData)

Table (testpred, testdata$species)


Problems in decision Tree algorithm


Ctree current versions do not handle missing values very well, so instances with missing values are sometimes divided into left subtrees, sometimes divided into right subtrees, which are determined by the substitution rules.

There is also a problem, if a variable in the training set is rejected after using the function Ctree to build the decision tree, the variable must also be included in the prediction of the test set, otherwise the calling function predict will fail.

In addition, if the test set differs from the class variable horizontal value of the training set, the predictions for the test set will fail.


Workaround


The way to solve this problem is to use the training set to build a decision tree, then use all the variables contained in the first decision tree to re-invoke Ctree to build a new decision tree, and set the training data explicitly based on the horizontal values of the categorical variables in the test set.


This article from the "CAS Computer Training" blog, declined to reprint!

R language topic, how to use party package to build decision tree?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.