Using ID3 algorithm to construct decision tree __ algorithm

Source: Internet
Author: User
Tags id3

Original address: http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm,
The translation level is limited, it is suggested to directly read the original feature selection

The first problem we need to solve when constructing a decision tree is that the feature on the current dataset is determined when it is partitioned. In order to find the decisive features and to divide the best results, we must evaluate each feature. Using the information gain here, the gain can describe the good or bad of dividing a data set with a specified feature. Select the one with the greatest gain as the basis for partitioning. To define the gain, first introduce the concept of entropy (Entropy).

Given the dataset s, its output is C (C can have more than one category)

Entropy (S) = S-p (i) log2 P (i)

I is a category of C, and P (i) is the proportion of category I in set S.

S is the entire data set. Example 1 If S has 14 members, 9 yes,5 no:entropy (S) =-(9/14) Log2 (9/14)-(5/14) Log2 (5/14) = 0.940 If the entropy=0 represents a There are members of s that belong to the same class (you can consider the image over (1,0) point of the LOG2 function). We define feature A on the data set S information gain Gain (S, A): Gain (S,a) =entropy (s)-S ((| sv|/| s|) *entropy (SV)) S-represents all possible V-values on feature a sv-dataset S with a data subset of the value of V for feature a | Number of SV|-SV | Number of s|-s Example 2 Suppose S is a data set with 14 members, and one of its properties is wind speed. Wind can have Weak and strong. The results of the 14-member classification were 9 yes,5 of No. For characteristic wind, assume that there are 8 wind=weak, 6 times Wind=strong, for Wind=weak, 6 Yes, 2 No, and for wind = strong, 3 Yes, 3 No. Therefore: Gain (s,wind) =entropy (S)-(8/14) *entropy (sweak)-(6/14) *entropy (sstrong) = 0.940-(8/14) *0.811-(6/14) *1.00 = 0.048

Entropy (Sweak) =-(6/8) *log2 (6/8)-(2/8) *log2 (2/8) = 0.811

Entropy (Sstrong) =-(3/6) *log2 (3/6)-(3/6) *log2 (3/6) = 1.00 Example of ID3 Suppose we want to decide by ID3 whether the weather is suitable to play baseball. We counted 2 weeks of data to help The Help ID3 algorithm creates a decision tree (Table 1). The purpose of our classification is to play baseball with the current weather decision, and the result is yes or No. The weather attributes are outlook, temperature, humidity, and wind speed. They may have the following results: Outlook = {Sunny, overcast, rain}

Temperature = {Hot, mild, cool}

Humidity = {high, normal}

Wind = {weak, strong}

Here is the data set for s:

Day

Outlook

Temperature

Humidity

Wind

Play Ball

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Table 1

We need to calculate the gain of these 4 attributes, which determines that the attribute is the root node of the decision tree.

Gain (S, Outlook) = 0.246

Gain (S, temperature) = 0.029

Gain (S, humidity) = 0.151

Gain (S, wind) = 0.048 (calculated in Example 2) The Outlook attribute has the highest gain, so it is used as the root node for the decision.

Because Outlook has 3 possible values, the root node has 3 branches (sunny, overcast, rain). Then we need to test the remaining 3 properties on the sunny node: humidity, temperature, wind.

Ssunny = {D1, D2, D8, D9, D11}, Outlook = Sunny

Gain (ssunny, humidity) = 0.970

Gain (ssunny, temperature) = 0.570

Gain (Ssunny, wind) = 0.019 Humidity has the highest gain, so as a node. Repeat the above process until the classification is complete or we have tested all the properties.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.