J48 principle: Originally named as C4.8, because it is the Java implementation of the algorithm, plus C4.8 for the commercial charge algorithm. In fact, J48 is a top-down, recursive division of the strategy, select a property placed in the root node, for each possible attribute value to produce a branch, the instance into multiple subsets, each subset corresponding to a branch of the root node, and then recursively repeat the process on each branch. When all instances have the same classification, stop.
The problem is how to: if you choose root node properties, create a branch?
Example: Weather.nominal.arff
We want to get a pure split, that is, split into pure node, want to find a property, it's a node is all Yes, a node is all no, this is the best case, because if it is a mixed node will need to split again
By quantifying the attributes that produce the most pure child nodes---computational purity (the goal is to get the smallest decision tree). The top-down tree induction method uses some heuristic methods---to generate the pure node heuristic method is based on information theory, that is, entropy, bits measurement information.
Information gain = Information entropy of the distribution before splitting-the information entropy of the post-split distribution, and select the attribute with the greatest information gain.
Calculate the information gain for these four properties, such as:
The information gain for Outlook, windy, humidity, and temperature is calculated to be 0.247bits, 0.048bits, 0.152bits, 0.029bits respectively, so select Outlook as the root node.
Example of calculation
Before splitting: Info (Outlook) =entropy (Outlook) =-9/14*LG (9/14) -5/14*lg (5/14) =0.940286
After splitting: inf0a (Outlook) =5/14*info (D1) +4/14*info (D2) +5/14*info (D3) =0.693535
Info (D1) =-2/5*LG (2/5) -3/5*LG (3/5) =0.9709490, info (D2) =0
Info (D3) =-3/5*lg (3/5) -2/5*lg (2/5)
Gain (Outlook) =info (Outlook)-inf0a (Outlook) =0.247bits
Analysis of J48 algorithm in decision tree