As the name implies, decision trees are based on tree structure to make decisions, which is a natural processing mechanism for human beings to face decision-making problems. For example, we're right, "is this a good melon?" The question is usually made in a series of judgments, first see what color it is, if it is green, then look at its root pedicle is what shape, if it is curled up, and then see what it sounds, and finally we get judgment, this is a good melon. The shape of the decision tree is roughly as shown in the figure below
Basic algorithm:
Input: training set d={(x1,y1), (x2,y2),..., (Xm,ym)} d=\{(X_1, Y_1), (X_2, y_2), ..., (x_m, y_m) \}
Property set A={A1,A2,..., ad} a=\{a_1, a_2, ..., a_d\}
procedure: function treegenerate (d,a D, A)
1: Build Node
The samples in 2:if D D all belong to the same category C C; Then
3: node is labeled C-c leaf node; Return
4:end if
5:if a=∅a = \emptyset OR D d The sample is the same value on a A; then
6: The node is marked as a leaf node, and its category is labeled as the class with the largest number of samples in D D; return
7:end if
8: Select the optimal partitioning attribute from a a a∗a_*
9:for each value of a∗a_* av∗a_*^v do
10: Create a branch for node; Make Dv D_v a subset of samples with value of av∗a_*^v on a∗a_*;
11:if Dv=∅d_v = \emptyset Then
12: Mark branch nodes as leaf nodes, whose categories are marked as