Original address: http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm,
The translation level is limited, it is suggested to directly read the original feature selection
The first problem we need to solve when constructing a decision tree is that the feature on the current dataset is determined when it is partitioned. In order to find the decisive features and to divide the best results, we must evaluate each feature. Using the information gain here, the gain can describe the good or bad of dividing a data set with a specified feature. Select the one with the greatest gain as the basis for partitioning. To define the gain, first introduce the concept of entropy (Entropy).
Given the dataset s, its output is C (C can have more than one category)
Entropy (S) = S-p (i) log2 P (i)
I is a category of C, and P (i) is the proportion of category I in set S.
S is the entire data set. Example 1 If S has 14 members, 9 yes,5 no:entropy (S) =-(9/14) Log2 (9/14)-(5/14) Log2 (5/14) = 0.940 If the entropy=0 represents a There are members of s that belong to the same class (you can consider the image over (1,0) point of the LOG2 function). We define feature A on the data set S information gain Gain (S, A): Gain (S,a) =entropy (s)-S ((| sv|/| s|) *entropy (SV)) S-represents all possible V-values on feature a sv-dataset S with a data subset of the value of V for feature a | Number of SV|-SV | Number of s|-s Example 2 Suppose S is a data set with 14 members, and one of its properties is wind speed. Wind can have Weak and strong. The results of the 14-member classification were 9 yes,5 of No. For characteristic wind, assume that there are 8 wind=weak, 6 times Wind=strong, for Wind=weak, 6 Yes, 2 No, and for wind = strong, 3 Yes, 3 No. Therefore: Gain (s,wind) =entropy (S)-(8/14) *entropy (sweak)-(6/14) *entropy (sstrong) = 0.940-(8/14) *0.811-(6/14) *1.00 = 0.048
Entropy (Sweak) =-(6/8) *log2 (6/8)-(2/8) *log2 (2/8) = 0.811
Entropy (Sstrong) =-(3/6) *log2 (3/6)-(3/6) *log2 (3/6) = 1.00 Example of ID3 Suppose we want to decide by ID3 whether the weather is suitable to play baseball. We counted 2 weeks of data to help The Help ID3 algorithm creates a decision tree (Table 1). The purpose of our classification is to play baseball with the current weather decision, and the result is yes or No. The weather attributes are outlook, temperature, humidity, and wind speed. They may have the following results: Outlook = {Sunny, overcast, rain}
Temperature = {Hot, mild, cool}
Humidity = {high, normal}
Wind = {weak, strong}
Here is the data set for s:
Day |
Outlook |
Temperature |
Humidity |
Wind |
Play Ball |
D1 |
Sunny |
Hot |
High |
Weak |
No |
D2 |
Sunny |
Hot |
High |
Strong |
No |
D3 |
Overcast |
Hot |
High |
Weak |
Yes |
D4 |
Rain |
Mild |
High |
Weak |
Yes |
D5 |
Rain |
Cool |
Normal |
Weak |
Yes |
D6 |
Rain |
Cool |
Normal |
Strong |
No |
D7 |
Overcast |
Cool |
Normal |
Strong |
Yes |
D8 |
Sunny |
Mild |
High |
Weak |
No |
D9 |
Sunny |
Cool |
Normal |
Weak |
Yes |
D10 |
Rain |
Mild |
Normal |
Weak |
Yes |
D11 |
Sunny |
Mild |
Normal |
Strong |
Yes |
D12 |
Overcast |
Mild |
High |
Strong |
Yes |
D13 |
Overcast |
Hot |
Normal |
Weak |
Yes |
D14 |
Rain |
Mild |
High |
Strong |
No |
Table 1
We need to calculate the gain of these 4 attributes, which determines that the attribute is the root node of the decision tree.
Gain (S, Outlook) = 0.246
Gain (S, temperature) = 0.029
Gain (S, humidity) = 0.151
Gain (S, wind) = 0.048 (calculated in Example 2) The Outlook attribute has the highest gain, so it is used as the root node for the decision.
Because Outlook has 3 possible values, the root node has 3 branches (sunny, overcast, rain). Then we need to test the remaining 3 properties on the sunny node: humidity, temperature, wind.
Ssunny = {D1, D2, D8, D9, D11}, Outlook = Sunny
Gain (ssunny, humidity) = 0.970
Gain (ssunny, temperature) = 0.570
Gain (Ssunny, wind) = 0.019 Humidity has the highest gain, so as a node. Repeat the above process until the classification is complete or we have tested all the properties.