Using ID3 algorithm to construct decision tree _

Using ID3 algorithm to construct decision tree __ algorithm

Last Update:2018-07-20 Source: Internet

Author: User

Tags id3

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original address: http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm,
The translation level is limited, it is suggested to directly read the original feature selection

The first problem we need to solve when constructing a decision tree is that the feature on the current dataset is determined when it is partitioned. In order to find the decisive features and to divide the best results, we must evaluate each feature. Using the information gain here, the gain can describe the good or bad of dividing a data set with a specified feature. Select the one with the greatest gain as the basis for partitioning. To define the gain, first introduce the concept of entropy (Entropy).

Given the dataset s, its output is C (C can have more than one category)

Entropy (S) = S-p (i) log2 P (i)

I is a category of C, and P (i) is the proportion of category I in set S.

S is the entire data set. Example 1 If S has 14 members, 9 yes,5 no:entropy (S) =-(9/14) Log2 (9/14)-(5/14) Log2 (5/14) = 0.940 If the entropy=0 represents a There are members of s that belong to the same class (you can consider the image over (1,0) point of the LOG2 function). We define feature A on the data set S information gain Gain (S, A): Gain (S,a) =entropy (s)-S ((| sv|/| s|) *entropy (SV)) S-represents all possible V-values on feature a sv-dataset S with a data subset of the value of V for feature a | Number of SV|-SV | Number of s|-s Example 2 Suppose S is a data set with 14 members, and one of its properties is wind speed. Wind can have Weak and strong. The results of the 14-member classification were 9 yes,5 of No. For characteristic wind, assume that there are 8 wind=weak, 6 times Wind=strong, for Wind=weak, 6 Yes, 2 No, and for wind = strong, 3 Yes, 3 No. Therefore: Gain (s,wind) =entropy (S)-(8/14) *entropy (sweak)-(6/14) *entropy (sstrong) = 0.940-(8/14) *0.811-(6/14) *1.00 = 0.048

Entropy (Sweak) =-(6/8) *log2 (6/8)-(2/8) *log2 (2/8) = 0.811

Entropy (Sstrong) =-(3/6) *log2 (3/6)-(3/6) *log2 (3/6) = 1.00 Example of ID3 Suppose we want to decide by ID3 whether the weather is suitable to play baseball. We counted 2 weeks of data to help The Help ID3 algorithm creates a decision tree (Table 1). The purpose of our classification is to play baseball with the current weather decision, and the result is yes or No. The weather attributes are outlook, temperature, humidity, and wind speed. They may have the following results: Outlook = {Sunny, overcast, rain}

Temperature = {Hot, mild, cool}

Humidity = {high, normal}

Wind = {weak, strong}

Here is the data set for s:

Day	Outlook	Temperature	Humidity	Wind	Play Ball
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Hot	High	Weak	Yes
D4	Rain	Mild	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

Table 1

We need to calculate the gain of these 4 attributes, which determines that the attribute is the root node of the decision tree.

Gain (S, Outlook) = 0.246

Gain (S, temperature) = 0.029

Gain (S, humidity) = 0.151

Gain (S, wind) = 0.048 (calculated in Example 2) The Outlook attribute has the highest gain, so it is used as the root node for the decision.

Because Outlook has 3 possible values, the root node has 3 branches (sunny, overcast, rain). Then we need to test the remaining 3 properties on the sunny node: humidity, temperature, wind.

Ssunny = {D1, D2, D8, D9, D11}, Outlook = Sunny

Gain (ssunny, humidity) = 0.970

Gain (ssunny, temperature) = 0.570

Gain (Ssunny, wind) = 0.019 Humidity has the highest gain, so as a node. Repeat the above process until the classification is complete or we have tested all the properties.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More