R language topic, how to use party package to build decision tree?

Last Update:2017-09-12 Source: Internet

Author: User

Tags random seed

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following will show you how to use the function ctree in the party package to build a decision tree on the iris dataset.

Sepal.length, Sepal.width, Petal.length, and Petal.width in the iris dataset will be used to predict the type of iris.

The function ctree in the party package is used to build the decision tree, and the function predict is used to predict the new data.

Before modeling, the iris dataset is divided into two subsets: 70% of the data is used for training, the remaining 30% is tested, and in order to achieve reproducible results, the random seed is set to a fixed value.

STR (IRIS)

Set.seed (1234)

IND <-Sample (2, Nrow (Iris), Replace=true, Prob=c (0.7, 0.3))

Traindata <-Iris[ind==1,]

TestData <-iris[ind==2,]

Next, load the party package, build a decision tree, and view the forecast results.

A few of the parameters that function Ctree use to control decision tree training are minsplit, Minbusket, Maxsurrogate, and maxdepth.

Decision tree, using the default settings of these parameters to build a decision tree.

Code, myformula Specifies that species is the target variable, and all remaining variables are arguments.

Library (Party)

Myformula <-Species ~ sepal.length + sepal.width + petal.length + petal.width

Iris_ctree <-ctree (Myformula, Data=traindata)

# Check the prediction

Table (Predict (Iris_ctree), traindata$species)

Then export the rules and draw the decision tree that has been built, and view

Print (Iris_ctree)

Plot (Iris_ctree)

Decision Tree Chart

A bar chart of each leaf node shows the probability that an instance is divided into a certain kind.

650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M02/A5/0E/wKioL1m3PCajNRzvAADIqaUslpQ331.jpg-wh_500x0-wm_ 3-wmp_4-s_2451926648.jpg "title=" Untitled 1.jpg "alt=" Wkiol1m3pcajnrzvaadiqauslpq331.jpg-wh_50 "/>

Simplifying Decision Trees

Plot (Iris_ctree, type= "simple")

The figure shows "Y" in the leaf node.

650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/06/5D/wKiom1m3PFvAvB3IAABqE0S0174881.jpg-wh_500x0-wm_ 3-wmp_4-s_927237793.jpg "title=" Untitled 2.jpg "alt=" Wkiom1m3pfvavb3iaabqe0s0174881.jpg-wh_50 "/>

For example, the tag "n=40,y= (1,0,0)" In Node 2 indicates that the node contains 40 training instances, and all instances belong to the category "Setosa".

Test decision Tree

Test the built decision tree using the test data.

# Predict on test data

Testpred <-Predict (iris_ctree, NewData = testData)

Table (testpred, testdata$species)

Problems in decision Tree algorithm

Ctree current versions do not handle missing values very well, so instances with missing values are sometimes divided into left subtrees, sometimes divided into right subtrees, which are determined by the substitution rules.

There is also a problem, if a variable in the training set is rejected after using the function Ctree to build the decision tree, the variable must also be included in the prediction of the test set, otherwise the calling function predict will fail.

In addition, if the test set differs from the class variable horizontal value of the training set, the predictions for the test set will fail.

Workaround

The way to solve this problem is to use the training set to build a decision tree, then use all the variables contained in the first decision tree to re-invoke Ctree to build a new decision tree, and set the training data explicitly based on the horizontal values of the categorical variables in the test set.

This article from the "CAS Computer Training" blog, declined to reprint!

R language topic, how to use party package to build decision tree?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More