Analysis of J48 algorithm in decision tree

Source: Internet
Author: User

J48 principle: Originally named as C4.8, because it is the Java implementation of the algorithm, plus C4.8 for the commercial charge algorithm. In fact, J48 is a top-down, recursive division of the strategy, select a property placed in the root node, for each possible attribute value to produce a branch, the instance into multiple subsets, each subset corresponding to a branch of the root node, and then recursively repeat the process on each branch. When all instances have the same classification, stop.

The problem is how to: if you choose root node properties, create a branch?

Example: Weather.nominal.arff

We want to get a pure split, that is, split into pure node, want to find a property, it's a node is all Yes, a node is all no, this is the best case, because if it is a mixed node will need to split again

By quantifying the attributes that produce the most pure child nodes---computational purity (the goal is to get the smallest decision tree). The top-down tree induction method uses some heuristic methods---to generate the pure node heuristic method is based on information theory, that is, entropy, bits measurement information.

Information gain = Information entropy of the distribution before splitting-the information entropy of the post-split distribution, and select the attribute with the greatest information gain.

Calculate the information gain for these four properties, such as:

The information gain for Outlook, windy, humidity, and temperature is calculated to be 0.247bits, 0.048bits, 0.152bits, 0.029bits respectively, so select Outlook as the root node.

Example of calculation
Before splitting: Info (Outlook) =entropy (Outlook) =-9/14*LG (9/14) -5/14*lg (5/14) =0.940286

After splitting: inf0a (Outlook) =5/14*info (D1) +4/14*info (D2) +5/14*info (D3) =0.693535

Info (D1) =-2/5*LG (2/5) -3/5*LG (3/5) =0.9709490, info (D2) =0

Info (D3) =-3/5*lg (3/5) -2/5*lg (2/5)

Gain (Outlook) =info (Outlook)-inf0a (Outlook) =0.247bits

Analysis of J48 algorithm in decision tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.