The information entropy is very bright. After you know the results of an event, the average amount of information will be given to you. When the uncertainty of an event increases, you need to find out the information required by the event, that is, the larger the information entropy, the more disordered and uncertain metric.
Calculation of information entropy:
-P [I] LOGP [I], with a base number of 2
Public static double calcentropy (int p []) {double entropy = 0; // used to calculate the total number of samples. P [I]/sum is the probability double sum of I = 0; int Len = P. length; For (INT I = 0; I <Len; I ++) {sum + = P [I];} For (INT I = 0; I <Len; I ++) {entropy-= P [I]/SUM * log2 (P [I]/SUM);} return entropy ;}
Given a sample array, the total number of samples is calculated cyclically in one round, and then the probability of each sample is obtained. Then, the formula can be used to calculate the total number of samples.
Information gain is the change value of information entropy. The node with the fastest decrease of information entropy can be used as the root node of decision tree, shortening the height of the tree.
The information gain of a property A to the sample set S is:
Gain (s, A) = H (S)-A attributes are weighted information entropy of known values.
Outlook |
Temperature |
Humidity |
Windy |
Play |
Sunny |
Hot |
High |
False |
No |
Sunny |
Hot |
High |
True |
No |
Overcast |
Hot |
High |
False |
Yes |
Rainy |
Mild |
High |
False |
Yes |
Rainy |
Cool |
Normal |
False |
Yes |
Rainy |
Cool |
Normal |
True |
No |
Overcast |
Cool |
Normal |
True |
Yes |
Sunny |
Mild |
High |
False |
No |
Sunny |
Cool |
Normal |
False |
Yes |
Rainy |
Mild |
Normal |
False |
Yes |
Sunny |
Mild |
Normal |
True |
Yes |
Overcast |
Mild |
High |
True |
Yes |
Overcast |
Hot |
Normal |
False |
Yes |
Rainy |
Mild |
High |
True |
No |
As shown in the data, a decision tree is constructed based on the data. In the future, you can decide whether to go out to play based on different weather conditions.
First, calculate the information entropy without knowing any weather conditions, and look at the play column directly. 9 for yes and 5 for no, so the formula is used to calculate the information entropy.
H =-9/14 * log (9/14)-5/14 * log (5/14) = 0.940
Calculate the information entropy of each attribute in sequence. First, check the outlook attribute. When outlook is a known value, calculate the information entropy.
1. When outlook is sunny, there are two play columns, three no columns, and the calculated information entropy is
H =-2/5 * log (2/5)-3/5 * log (3/5) = 0.971
2. When outlook = overcast, check the play column. If yes has 4 and no has 0, calculate the information entropy.
H = 0
3. When outlook = rainy, check the play column. There are three values for "yes", two for "no", and calculate the information entropy.
H = 0.971
When outlook is sunny, overcast, or rainy, the probability is 5/14, 4/14, and 5/14 respectively.
Therefore, when outlook is a known value, the information entropy is 5/14*0.971 + 4/14*0 + 5/14*0.971 = 0.693.
Therefore, the information gain of outlook attribute gain = 0.940-0.693 = 0.247
Similarly, the information gains of temperature, humidity, and windy are calculated as 0.029, 0.152, and 0.048 respectively.
The maximum information gain attribute is outlook. Therefore, this node is the root node of the decision tree.
Outlook |
Temperature |
Humidity |
Windy |
Play |
|
Yes |
No |
|
Yes |
No |
|
Yes |
No |
|
Yes |
No |
Yes |
No |
Sunny |
2 |
3 |
Hot |
2 |
2 |
High |
3 |
4 |
False |
6 |
2 |
9 |
5 |
Overcast |
4 |
0 |
Mild |
4 |
2 |
Normal |
6 |
1 |
Trur |
3 |
3 |
|
|
Rainy |
3 |
2 |
Cool |
3 |
1 |
|
|
|
|
|
|
|
|
All samples of branch overcast are positive examples, so they become leaf nodes with the target category as yes.
Machine Learning [2] Calculation of entropy and information gain in decision trees, and construction of decision tree ID3