Algorithm of ID3 decision tree Establishment algorithm __

Source: Internet
Author: User
Tags id3

1. Deciding on classification attributes;

2. For the current data table, establish a node n;

3. If the data in the database belongs to the same class, N is the leaf and the class is marked on the leaf.

4. If there are no other attributes in the data table to consider, then n is also the leaf, then the category is marked on the leaves according to the principle of minority obedience.

5. Otherwise, select an optimal attribute as the test attribute of the node based on the average information expectation E or gain value.

6. When the node attribute is selected, for each value in the property:

Generate a branch from N and collect data from that branch in the data table to form the data table of the branch node, from the column of the last node attribute in the table, if the branch data table is not empty, use the above algorithm


Original table:

Count Age Income Students Credibility Category: Buy calculation Machine.
64 Green High Whether Liang Don't buy
64 Green High Whether Excellent Don't buy
128 In High Whether Liang Buy
60 Old In Whether Liang Buy
64 Old Low Is Liang Buy
64 Old Low Is Excellent Don't buy
64 In Low Is Excellent Buy
128 Green In Whether Liang Don't buy
64 Green Low Is Liang Buy
132 Old In Is Liang Buy
64 Green In Is Excellent Buy
32 In In Whether Excellent Buy
32 In High Is Liang Buy
63 Old In Whether Excellent Don't buy
1 Old In Whether Excellent Buy

First, we calculate the mutual information of the age attribute

by table:

Age
Count Age Income Students Credibility Category: Buy a computer.
60 Old In Whether Liang Buy 4
64 Old Low Is Liang Buy 5
64 Old Low Is Excellent Not buy 6
132 Old In Is Liang Buy 10
63 Old In Whether Excellent Not buy 14
1 Old In Whether Excellent Buy 15
64 Green High Whether Liang Not buy 1
64 Green High Whether Excellent Not buy 2
128 Green In Whether Liang Not buy 8
64 Green Low Is Liang Buy 9
64 Green In Is Excellent Buy 11
128 In High Whether Liang Buy 3
64 In Low Is Excellent Buy 7
32 In In Whether Excellent Buy 12
32 In High Is Liang Buy 13

Matlab code:

Clear
Clc
SM=[64 64 128 60 64 64 64 128 64 132 64 32 32 63 1]; % total population


% Category: Buy computer, do not buy computer;--U1, U2
% Age A1: green, medium and old;
% income A2: Low, medium and high;
% Student A3: yes, no;
% Credit A4: good, excellent;


% seeking prior entropy (for category)
M=sum (SM); % total population
BM=SM (1) +sm (2) +SM (6) +SM (8) +SM (14); % not buy total number of people
MM=M-BM; % total number of buyers
pu1=mm/m;
pu2=bm/m;
hu=-(PU1*LOG2 (PU1) +pu2*log2 (PU2));
%----------------------------------
% posterior entropy (to A1): v1 = cyan, v2=, v3= old
Q1=SM (1) +sm (2) +SM (8) +SM (9) +SM (11); % of the total number of young people
Z1=SM (3) +SM (7) +sm (+) +SM (13); % Middle-aged total
L1=M-Q1-Z1; % of the total number of older persons
pv1=q1/m;
pv2=z1/m;
pv3=l1/m;
% of Qinghai
QM=SM (9) +SM (11); % of young people who buy computers
BM=Q1-QM; % of young people who do not buy computers
pu1v1=qm/q1;
pu2v1=bm/q1;
huv1=-(PU1V1*LOG2 (pu1v1) +pu2v1*log2 (PU2V1));
% for medium
ZM=SM (3) +SM (7) +sm (+) +SM (13);
bm=0;
Pu1v2=1;
pu2v2=0;
huv2=-(PU1V2*LOG2 (PU1V2) +pu2v2*log2 (pu2v2+eps));
% for old
LM=SM (4) +SM (5) +SM (15) +sm;
BM=L1-LM;
PU1V3=LM/L1;
PU2V3=BM/L1;
huv3=-(PU1V3*LOG2 (pu1v3) +pu2v3*log2 (pu2v3));


% conditional entropy (to A1)
T1=[pv1 Pv2 Pv3];
T=[huv1 HUv2 HUv3];
Disp (' H (computer | age): ');
Huv=sum (T.*T1)


% Mutual information (for A1)
Ia1=hu-huv

Secondly, the mutual information of income is calculated.

By the table

Income
Count Age Income Students Credibility Category: Buy a computer.
64 Old Low Is Liang Buy 5
64 Old Low Is Excellent Not buy 6
64 Green Low Is Liang Buy 9
64 In Low Is Excellent Buy 7
64 Green High Whether Liang Not buy 1
64 Green High Whether Excellent Not buy 2
128 In High Whether Liang Buy 3
32 In High Is Liang Buy 13
60 Old In Whether Liang Buy 4
132 Old In Is Liang Buy 10
63 Old In Whether Excellent Not buy 14
1 Old In Whether Excellent Buy 15
128 Green In Whether Liang Not buy 8
64 Green In Is Excellent Buy 11
32 In In Whether Excellent Buy 12

Code is similar.

Finally, the result is:

Age information Gain =0.9537-0.6877 = 0.2660 (1) revenue information Gain =0.9537-0.9361 = 0.0176 (2) Learn Health information Gain =0.9537-0.7811 = 0.1726 (3) reputation information gain =0.9537-0.9048 = 0.0453 (4)
It can be considered that age as the root node (to be continued)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.