Decision Tree: ID3 and C4.5 algorithm

Source: Internet
Author: User
Tags id3

1. Basic Concepts

1) Definition:

A decision tree is a predictive model; He represents a mapping relationship between object properties and object values, which represents a possible property value for each node in the tree.

2) Representation method:

By arranging instances from the root node to a leaf node, the leaf node is the classification that the instance belongs to. Each node on the tree specifies a test for a property, and each successive branch of the node corresponds to a possible value for that property.

  

3) Application of decision tree:

A. Instances are represented by ' attribute-value ' pairs

B. The objective function has discrete output values

C. A description that may require 10 disjunction

D. Training data can contain errors

E. Training data can contain instances of missing attribute values

2.ID3: A top-down greedy search traverses the possible decision tree spaces.

1) Use entropy to measure the uniformity of the sample:

  

S is a sample set, C represents the number of attributes, Pi represents the ratio of the first attribute value in the sample set, and the logarithm based on 2 is based on the number of bits to measure the length of the encoding.

2) Reduce the expected entropy with the information increment measure:

  

A represents a property value, and values (a) indicates that all possible SV of a property is a subset of the value of V for the property in S.

3) ID3 Features:

A. Assuming that the space contains all decision trees, is a complete space for finite attributes

B. When traversing the entire space, only the current single hypothesis is maintained, and the conformance test resulting from the consideration of all assumptions is lost.

C. does not backtrack during the search process, so it only converges to local optimality, not global optimal

D. Every step of the search uses all training samples to determine the refinement of the current assumptions based on statistics

E.ID3 induction bias: shorter trees take precedence over longer trees, i.e., higher information-gain attributes are more closely rooted to the root node (Ocham Razor: The simplest hypothesis to select Fit data first)

3.c4.5

1) Improvements to the ID3:

A. Selection of attributes using information gain rate

B. Post-rule pruning

C. Discrete data can be processed

D. Processing of missing attribute data is possible

2) split information metric: used to measure the breadth and uniformity of attribute splitting data

  

3) Information gain rate: as attribute selection

  

4. Source code:

1) ID3:

Https://github.com/Ares08/ID3-Machine-Learning-Algorithm

2) C4.5:

https://github.com/Ares08/C4.5

  

Decision Tree: ID3 and C4.5 algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.