Machine Learning-CART Decision Tree

Source: Internet
Author: User

Previously, I read Random Forest and decision Tree extensively. Now I have implemented a specific decision Tree algorithm: CART (Classification and Regression Tree ).

CART is a decision tree algorithm proposed by Breiman, Friedman, Olshen, and Stone in 1984. Although it is not the first decision tree in the machine learning field, however, it is the first decision tree with complex statistical and probability theory guarantees (these words are too academic and refer to references [2]).

CART is a binary decision tree, that is, each internal node of the decision tree (decision node) has a maximum of two branches. Because ID3 and C4.5 algorithms have been introduced in previous blog posts, here we will only introduce CART from determining the best splitting attribute and pruning.

1. determine the best splitting attribute (and the best splitting point)

We only consider continuous values here. For each possible split point of each INPUT attribute (the split point is the midpoint of two adjacent consecutive values), we calculate each DivisionGini Metric:, And then calculate the weighted sum for the Gini indicators of the two divisions of the split point. We select the split point of the INPUT attribute with the smallest Gini indicator for division.

Generate a completely increasing decision tree based on the above rules without any stop conditions.

2. pruning

Because the decision tree generated in the previous step does not have a stop condition, this decision tree may be very large and may overfit the training data. Therefore, it is necessary to perform post-pruning on the decision tree.

The CART algorithm usesCost-Complexy pruning:

We are used to measure the cost complexity of a subtree. R (T) indicates the increase in the error caused by replacing the subtree with a leaf node. number-of-leaves indicates the number of leaf nodes of the subtree T, and α is not a constant, instead, it is a number that increases from 0 to infinity. For decision tree T, α is not a constant, but a number that increases from 0 to infinity. During the step-by-step process, the Child tree with the minimum rα is selected each time and replaced with a leaf node. the above steps are performed for the replaced tree iteration.

The above process may be a little complicated. We can use another equivalent pruning method to get the same result:Weakest-Link-Pruning:

(1) For all subtree STi, we try to replace STi with appropriate leaf nodes, and then calculate the ratio of the increased error E to the leaf node of STi. We chose the subtree STj with the smallest ratio and replaced it with a suitable leaf node.

(2) Repeat the above steps to replace a subtree each time. We will get a series of decision trees from the completely increasing tree T0 to the decision tree Tn with only one root node: T0, T1,... Tn. Then we use an independent validation set (we can extract 1/3 from the available data set as the verification set, and the remaining 2/3 as the training set) to verify the classification accuracy of each decision tree. Select the CART decision tree with the highest accuracy.

References:

[1] CART algorithm learning and implementation

[2] Ten algorithms for Machine Learning: CART

[3] Complexity-Based Evaluation Of Rule-Based Expert Systems

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.