ID3 algorithm learning experience

Last Update:2018-12-06 Source: Internet

Author: User

Tags id3 list of attributes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ID3 (examples, targetattribute, attributes)
/*
Examples: training sample set
Targetattribute: Target attribute to be predicted
Attributes: list of attributes other than the target attributes for learning decision trees
*/

If the targetattribute values of all examples are the same as a, a single node tree with the node value a is returned.
Otherwise, further judgment and analysis are required based on other attributes.

If attributes is empty, no attribute can be used to determine the number of nodes. At this time, the number of nodes is returned. The Node value is the most common targetattribute value in all currently examples (this is a reasonable assumption)
Otherwise,
Select the best property bestattr based on the principle of maximum information gain. The value of the Tree node is bestattr.
For each possible value V of bestattr, the value on the branches is v.
Examples_v is the subset of the bestattr value in examples as V
Recursive computation subtree ID3 (examples_v, targetattribute, attributes-{bestattr })

Information Gain and entropy:
The entropy of the training sample set examples is the distribution of targetattribute values,
If the distribution is more even, the greater the entropy, that is, the larger the information. The smaller the distribution, the smaller the entropy, that is, the smaller the information.

How can we choose the best attributes for decision-making?
We select an attribute, divide examples into subsample Sets Based on each value, and then calculate the sum of entropy of the subsample set,
Information Gain = entropy of the original examples-sum of entropy of the subsample set
Intuitively, the best attribute should be the simplest classification and determination of the current training sample. Therefore, the entropy should be the root node of the decision tree with the minimum value, that is, when no decision is made, the uncertainty is the biggest, that is, the entropy is the largest; the leaf node of the decision tree should be the definite value, and the entropy is 0, which is the smallest.

The decision tree decision process is to reduce its uncertainty and entropy. Therefore, when selecting an attribute, you should select the attribute that can minimize entropy, that is, the attribute with the maximum information gain.

Perceptual knowledge should be the choice of attributes that can divide samples more evenly. Of course, this is not accurate. Uniformity cannot only refer to the size of the subsample set, but should also be related to the value. It is determined by calculating the information gain.
Therefore, the attribute with the largest information gain is the best one. (Note: The entropy calculation method of each subsample is the same as that of examples,
The sum of Entropy of a subsample set is not a simple accumulation, but a coefficient multiplied by the size of each subsample set/examples)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More