Regression Tree of R language

Source: Internet
Author: User

Rpart packages enable regression trees. Usually there are two steps to establish a regression tree: 1. Build a larger tree 2. Delete some nodes by statistical estimate to prune the tree.

Regression Tree Foundation implementation

Library (Rpart)

The Rpart (y~.,data=data1) Parameter form is the same as the parameter of the LM () function

Graphical display:

Plot (RM) text (RM)

When the Rpart () function builds the tree, the tree build process ends when the following conditions are met:

1, the decrease of the deviation is less than a given limit value;

2, when the number of samples in the node is less than a given limit;

3, when the depth of the tree is greater than a given limit value

These 3 threshold values are determined by the three parameters (CP, Minsplit, maxdepth) in the Rpart () function. The default value is 0.01, 20, 30

Pruning method

A pruning method for loss-of- complexity pruning is realized in the Rpart package.

This method uses R to calculate the parameter value of CP at each tree node, and this pruning method attempts to estimate the CP value to ensure the best compromise between the accuracy of the predictions and the size of the tree.

The PRINTCP () function can be used to generate some subtrees of the regression tree and to estimate the performance of those trees.

PLOTCP (RM)

 

The regression tree established by the Rpart () function is the last tree in the list above (tree 9). The CP value of this tree is 0.01 (the default value of CP), which consists of nine Tests and a relative error value (compared to the root node) of 0.354.

R applies the internal process of 10 percent cross-validation, which evaluates the average relative error of the tree to 0.70241+0.11523

Based on these more robust performance estimation information, you can avoid overfitting problems.

As you can see, the predicted relative error (0.67733) of the No. 8th tree is minimal.

Criteria for choosing a good regression tree:

1, with the estimated value of CP as the criterion;

2, 1-se rules; This includes checking the estimated error of cross-validation (xerror column) and the standard error (XSTD column).

In this case, the 1-se rule tree is the smallest tree, the error is less than 0.67733+0.10892=0.78625, and the estimated error of the 2nd tree tested by 1 is 0.73358.

If you want to select a tree that is not recommended for R, you can use a different CP value to build the tree:

Rm2=prune (rm,cp=0.08)

  

to trim a tree interactively : Snip.rpart ()

Two different ways:

1, point out the place where you need to trim the node number (you can get the tree's node number by outputting the tree object), return the tree object

2, first draw the regression tree in the graphics window, and then call the function without the second parameter. Click on the node to trim the tree at that node. can be executed continuously until the right click ends.

Regression Tree of R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.