Rpart packages enable regression trees. Usually there are two steps to establish a regression tree: 1. Build a larger tree 2. Delete some nodes by statistical estimate to prune the tree.
Regression Tree Foundation implementation
Library (Rpart)
The Rpart (y~.,data=data1) Parameter form is the same as the parameter of the LM () function
Graphical display:
Plot (RM) text (RM)
When the Rpart () function builds the tree, the tree build process ends when the following conditions are met:
1, the decrease of the deviation is less than a given limit value;
2, when the number of samples in the node is less than a given limit;
3, when the depth of the tree is greater than a given limit value
These 3 threshold values are determined by the three parameters (CP, Minsplit, maxdepth) in the Rpart () function. The default value is 0.01, 20, 30
Pruning method
A pruning method for loss-of- complexity pruning is realized in the Rpart package.
This method uses R to calculate the parameter value of CP at each tree node, and this pruning method attempts to estimate the CP value to ensure the best compromise between the accuracy of the predictions and the size of the tree.
The PRINTCP () function can be used to generate some subtrees of the regression tree and to estimate the performance of those trees.
PLOTCP (RM)
The regression tree established by the Rpart () function is the last tree in the list above (tree 9). The CP value of this tree is 0.01 (the default value of CP), which consists of nine Tests and a relative error value (compared to the root node) of 0.354.
R applies the internal process of 10 percent cross-validation, which evaluates the average relative error of the tree to 0.70241+0.11523
Based on these more robust performance estimation information, you can avoid overfitting problems.
As you can see, the predicted relative error (0.67733) of the No. 8th tree is minimal.
Criteria for choosing a good regression tree:
1, with the estimated value of CP as the criterion;
2, 1-se rules; This includes checking the estimated error of cross-validation (xerror column) and the standard error (XSTD column).
In this case, the 1-se rule tree is the smallest tree, the error is less than 0.67733+0.10892=0.78625, and the estimated error of the 2nd tree tested by 1 is 0.73358.
If you want to select a tree that is not recommended for R, you can use a different CP value to build the tree:
Rm2=prune (rm,cp=0.08)
to trim a tree interactively : Snip.rpart ()
Two different ways:
1, point out the place where you need to trim the node number (you can get the tree's node number by outputting the tree object), return the tree object
2, first draw the regression tree in the graphics window, and then call the function without the second parameter. Click on the node to trim the tree at that node. can be executed continuously until the right click ends.
Regression Tree of R language