I. INTRODUCTION
An important task of the decision tree is to understand the knowledge contained in the data.
Decision Tree Advantages: The computational complexity is not high, the output is easy to understand, the loss of the median is not sensitive, you can process irrelevant feature data.
Cons: Problems that may result in over-matching.
Applicable data type: numeric and nominal type.
Two. General process of decision tree
1. Collect data: You can use any method.
2. Prepare the data: The tree construction algorithm only applies to nominal-type data, so the numerical data must be discretized.
3. Analyze data: You can use any method, after the construction tree is complete, you should check whether the graphic meets the expected criteria.
4. Training algorithm: Structure of the construction tree
5. Test algorithm: Use the XP tree to calculate the error rate.
6. Using the algorithm: This step is used in any supervised learning algorithm, using decision trees to better understand the intrinsic meaning of the data.
Three. Representation of decision Trees
The decision tree classifies instances by arranging instances from the burgundy nodes to a leaf node, and the leaf nodes are the categories to which the instances belong. Each node on the tree specifies a test for a property of the instance, and each successive branch of the node corresponds to one of the possible values of the attribute. The method of classifying an instance is to start with the root node of the tree, test the properties of the node, and then move down the branch that corresponds to the property value of the given instance. This process is then repeated on the subtree of the root of the new node.
The decision tree corresponds to an expression:
Four. Basic Decision tree Learning Algorithm 1. ID3 algorithm
Learn by constructing a decision tree from top to bottom. The construction process is from "which property will be tested at the root node of the tree?" "The question began. To answer this question, use statistical testing to determine the ability of each instance attribute to classify training samples separately. The best attributes for classification are selected as tests of the root node of the tree. Then create a branch for each possible value of the root node property and arrange the training samples under the appropriate branches. Then repeat the entire process, using the training sample associated with each branch node to select the best properties to be tested at that point. This creates a greedy search for a qualifying decision tree (greedy search), which means that the algorithm never re-considers the original selection.
specifically for learning Boolean functions. ID3 Algorithm Overview
ID3 (Examples,target_attribute,attributes)
Examples is the training sample set. Target_attribute is the target attribute of this tree to be tested. Attributes is a list of properties for the decision tree test that is available in addition to the target attribute. Returns a decision tree that can correctly classify a given examples.
If examples is positive, then return the single node tree root of label=+
If the examples is reversed, then return the single node tree root of label=+
If attributes is empty, return the value of the most common target_attribute in the single-node tree root,label=examples
? Otherwise start
? The best attributes for classifying examples capabilities in A←attributes
? Root's decision attribute ←a
? For each possible value of a VI
? Under root add a new branch corresponding to the test A=vi
? Make examples vi meet the subset of the A attribute value VI for examples
? If examples vi is empty
? Add a leaf node under this new branch, the most common Target_attribute value in the label=examples of the knot.
? Otherwise add a subtree under this new branch ID3 (Examples vi,target_attribute,attributes-{a})
? end
? Return to Root
2. Which attribute is the best classification attribute
Entropy (entropy): Describes the purity of any sample set (purity).
Entropy determines the minimum number of bits required to encode the classification of any member of the set S (that is, a member that is randomly drawn at a uniform probability).
If the target attribute has a C different value, then the entropy of the class s relative to the C State (c-wise) is defined as:
Pi is the proportion of s that belong to category I.
Information gain (information gain): The information gain of an attribute is reduced by the expected entropy resulting from the use of this attribute to split the sample.
Values (a) is a set of all possible values for property A, and Sv is a subset of the value of V for attribute a in S.
For example, suppose S contains 14 sample-[9+,5-]. In these 14 examples, it is assumed that 2 of the 6 and inverse examples in the positive example have Wind=weak, others have wind=strong. The information gain can be calculated as follows by the attribute wind classification of 14 samples.
Values (Wind) =weak,strong
S=[9+,5-]
Sweak←[6+,2-]
Sstrong←[3+,3-]
=entropy (S)-(8/14) Entropy (sweak)-(6/14) Entropy (Sstrong)
=0.940-(8/14) 0.811-(6/14) 1.00
=0.048
3. For example
- The information gain for the four properties is calculated first:
Gain (S,outlook) =0.246
Gain (s,humidity) =0.151
Gain (S,wind) =0.048
Gain (s,temperature) =0.029
Based on the information gain criteria, the properties Outlook provides the best predictions for the target attribute Playtennis on the training sample.
Ssunny ={d1,d2,d8,d9,d11}
Gain (ssunny,humidity) =0.970-(3/5) 0.0-(2/5) 0.0=.970
Gain (Ssunny, temperature) =0.970-(2/5) 1.0-(2/5) 1.0-(1/5) 0.0=.570
Gain (Ssunny, Wind) =0.970-(2/5) 1.0-(3/5). 918=.019
Five A hypothetical space search in decision tree learning
The hypothetical space in the ID3 algorithm contains all decision trees, which is a complete space for finite discrete-valued functions of existing properties.
When the decision tree space is changed, ID3 only maintains a single current hypothesis.
The basic ID3 algorithm does not backtrack in the search.
The ID3 algorithm uses all of the current training samples at every step of the search and, based on statistics, feels how to simplify the previous assumptions.
About the C4.5 decision tree you can refer to http://www.cnblogs.com/zhangchaoyang/articles/2842490.html
Reference: http://www.cnblogs.com/lufangtao/archive/2013/05/30/3103588.html
Decision Tree of machine learning algorithm