Decision Tree Algorithm

Last Update:2018-12-04 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Summary

In the previous two articles, the naive Bayes classification and Bayesian Network classification algorithms are introduced and discussed respectively. These two algorithms are based on Bayesian theorem and can be used to deduce the probability of classification and decision-making problems. In this article, we will discuss another widely used classification algorithm-Decision Tree
Tree ). Compared with Bayesian algorithms, decision trees do not require any domain knowledge or parameter settings during the construction process. Therefore, in practical applications, decision trees are more suitable for probe-based Knowledge Discovery.

2. Decision Tree Guidance

In general, the idea of decision tree classification is similar to finding objects. Now imagine a girl's mother would introduce her boyfriend to her, so she had the following conversation:

Daughter: How old is it?

Mother: 26.

Daughter: Long Shuai?

Mother: very handsome.

Daughter: high income?

Mother: Not very high. Moderate.

Daughter: is it a civil servant?

Mother: Yes. I work in the tax bureau.

Daughter: Well, I'll see you.

This girl's decision-making process is a typical classification tree decision. It is equivalent to dividing men into two categories by age, appearance, income, and whether or not civil servants: Seeing and seeing. Assume that the girl's requirements for men are: civil servants under the age of 30, who are of medium or higher sizes and are high-income or above, then this can be used to represent the girl's decision-making logic (statement: this decision tree is purely a product of YY for writing articles. It has no basis and does not represent any girl's mate selection tendency. Please ask my fellow female questions ^_^ ):

The complete expression shows the girl's policy to determine whether to see a dating object. The green node indicates the judgment condition, and the orange node indicates the decision result, the arrow indicates the decision path under different conditions. The red arrow indicates the decision process of girls in the preceding example.

This figure can basically be regarded as a decision tree, saying that it is "basically measurable" because the judgment conditions in the figure are not quantified, such as income in high school and low school, and so on. It cannot be regarded as a decision tree in a strict sense, if all the conditions are quantified, the decision tree becomes the true one.

With the intuitive understanding above, we can formally define the decision tree:

A decision tree is a tree structure (which can be a binary tree or a non-Binary Tree ). Each non-leaf node represents a test on a feature attribute, and each branch represents the output of this feature attribute on a value field, and each leaf node stores a category. The decision tree is used to test the corresponding feature attributes of the items to be classified from the root node, and select the output branch based on the value until it reaches the leaf node, the classification of leaf nodes is used as the decision result.

We can see that the decision tree decision making process is very intuitive and easy to understand. Currently, decision trees have been successfully applied in many fields such as medicine, manufacturing, astronomy, branch biology, and commerce. After understanding the definition of a decision tree and its application methods, the following describes the decision tree construction algorithm.

3. Decision Tree Construction

Different from Bayesian algorithms, the construction process of decision trees does not rely on domain knowledge. It uses attribute selection metrics to select and divide tuples into attributes of different classes. The so-called decision tree structure is to select a property measurement to determine the topology between each feature attribute.

The key step to construct a decision tree is to split the attributes. The so-called split attribute is to construct different branches at a node based on different features and attributes. The goal is to make each split subset as pure as possible ". If possible, "pure" means to make a split subset to be classified belong to the same category. Split attributes are divided into three different situations:

1. The property is a discrete value and the binary decision tree is not required. In this case, each partition of the attribute is used as a branch.

2. The property is a discrete value and the binary decision tree must be generated. In this case, a subset of the attribute is used for testing, and two branches are divided according to "belong to this subset" and "not belong to this subset.

3. the attribute is a continuous value. At this time, determine a value as the split point split_point and generate two branches according to> split_point and <= split_point.

The key to constructing a decision tree is to select an attribute measurement. An Attribute selection measurement is a selection splitting criterion, it is a heuristic method that divides the data in a given training set of class tags into individual classes "best". It determines the topology structure and the choice of the split point split_point.

There are many attribute selection Measurement Algorithms. Generally, the top-down recursive splitting method is used, and the greedy policy without Backtracking is adopted. Here we will introduce ID3 and C4.5 common algorithms.

3.1 ID3 algorithm

From the information theory knowledge, the smaller the expected information, the larger the information gain, and the higher the purity. Therefore, the core idea of the ID3 algorithm is to select the information gain measurement attribute and select the attribute with the largest information gain after splitting for splitting. Next we will first define several concepts to be used.

If D is used to divide training tuples by category, the entropy of D is:

Pi indicates the probability that the I-th category appears in the entire training tuples. You can use the number of elements in this category divided by the total number of elements in the training tuples as an estimate. The actual meaning of entropy indicates the average information required by the class label of the element group in D.

Now, if we divide the training tuples d by attribute a, the expected information of A to D is:

The information gain is the difference between the two:

The ID3 algorithm calculates the gain rate of each attribute each time it needs to be split, and then selects the attribute with the highest gain rate for splitting. Next we will continue to use the examples of false account detection in the SNS community to illustrate how to use the ID3 algorithm to construct a decision tree. For simplicity, we assume that the training set contains 10 elements:

S, M, and l indicate small, medium, and large, respectively.

Set L, F, H, and r to indicate the log density, friend density, whether to use the real avatar, and whether the account is real. The information gain of each attribute is calculated below.

Therefore, the information gain of log density is 0.276.

In the same way, the information gains of H and F are 0.033 and 0.553, respectively.

Because F has the maximum information gain, F is selected as the splitting attribute for the first split. The result after split is shown as follows:

Then, recursively use this method to calculate the splitting attribute of the subnode, and finally obtain the entire decision tree.

For simplicity, the feature attributes are discretization. In fact, both log density and friend density are continuous attributes. If the feature attribute is continuous, you can use the ID3 algorithm as follows:

First, the elements in "D" are sorted by feature attributes. The intermediate points of each two adjacent elements can be considered as potential Split points starting from the first potential split point, split D and calculate the expected information of two sets. The point with the smallest expected information is called the best split point of this attribute, and its information is expected to be the information expectation of this attribute.

3.2. C4.5 algorithm

The ID3 algorithm is biased towards multi-value Attributes. For example, if there is a unique identifier attribute ID, ID3 selects it as the splitting attribute. Although this makes the division completely pure, however, this classification is almost useless for classification. Subsequent algorithms C4.5 of ID3 use gain rate (gain
Ratio) to overcome this bias.

The C4.5 algorithm first defines "split information", which can be expressed:

The symbols have the same meaning as the ID3 algorithm, and the gain rate is defined:

C4.5 selects the attribute with the maximum gain rate as the split attribute. Its specific application is similar to ID3 and will not be repeated.

4. Some additional information about Decision Trees 4.1. What should I do if the attributes are used up?

This may occur during decision tree construction: All attributes are used up as split attributes, but some subsets are not pure sets, that is, the elements in the set do not belong to the same category. In this case, because no more information can be used, we generally perform a "majority vote" on these subsets, that is, we use the category with the most frequent occurrences of this subset as the node category, then, the node is used as a leaf node.

4.2 about pruning

In actual decision tree construction, pruning is usually required. In this case, the over-fitting problem caused by Data Noise and outlier is solved. There are two types of pruning:

Pruning first-during the construction process, when a node meets the pruning condition, the construction of this branch is directly stopped.

Post-pruning-construct a complete decision tree and then use certain conditions to traverse the tree for pruning.

Specific pruning algorithms are not described here. If you are interested, refer to the relevant literature.

This article from http://www.cnblogs.com/hexinuaa/articles/2143531.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More