Ten classic algorithms for data mining (10) cart: Classification and regression tree

Source: Internet
Author: User

If a person has to choose a classification technology that features good performance in a wide range and does not require application developers to make a lot of effort and is easy to understand by end users, then brieman, the classification tree approach proposed by Friedman, olshen and stone (1984) is a strong competitor. We will first discuss the classification process, and then in subsequent sections we will show how this process is used to predict continuous dependent variables. Programs used by brieman and others to implement these processes are called the cart, classification and regression trees methods.

CATEGORY tree
There are two key ideas under the classification tree. The first is the idea of recursively dividing the space of independent variables; the second is to use verification data for pruning.

Recursive Division
Let's use variable Y to represent the dependent variable (classification variable), and x1, x2, X3,..., XP to represent the independent variable. Recursively divides the p-dimensional space of variable X into non-overlapping rectangles. This division is done recursively. First, an independent variable is selected, for example, a Si Value of Xi and XI. For example, if Si is selected, the P-dimensional space is divided into two parts: one part is the p-dimensional superrectangle, the vertices in the preceding statement must meet the requirements of xi <= Si. The other P-dimensional superrectangle contains all vertices that meet the requirements of xi> Si. Then, one part of the two parts is divided in a similar way by selecting a variable and the score of the variable. This leads to three rectangular areas (from here we will refer to all the super rectangles as rectangles ). As this process continues, we get smaller and smaller rectangles. This idea is to divide the entire x space into rectangles, where every small rectangle is as homogeneous or "pure" as possible. "Pure" means that all vertices in a rectangle belong to the same class. We think that all the contained points belong to only one class (of course, this is not always possible, because there are often some points belonging to different classes, but the independent variables of these points have identical values ).

 

For more information, see:

Http://www.core.org.cn/NR/rdonlyres/Sloan-School-of-Management/15-062Data-MiningSpring2003/338F02AD-0DD8-4199-8727-35FCF5A15B57/0/L3ClassTrees.pdf

 

Http://www.cqvip.com/onlineread/onlineread.asp? Id = 28180864

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.