The general idea of a ID3 algorithm
The basic ID3 algorithm is constructed by constructing a decision tree from top to bottom to learn. The first thing we think about is where the structure of the tree starts, and this involves choosing attributes to construct the tree, so how do you choose Properties? To solve this problem, we use statistical tests to determine the ability of each instance attribute to classify the training sample individually, and to test the best attributes of the cluster as the root node. Then create a branch for each possible value of the root node property and arrange the training samples under the appropriate branches. Then repeat the entire process, using the training sample associated with each branch node to select the best properties to be tested at that point. This creates a greedy search for a qualifying decision tree, which means that the algorithm never goes back and re-considers the previous selection.
The following is the ID3 algorithm for two classification process
We can see that the construction of a decision tree is a recursive process.
Two entropy (entropy) and information gain (information gain)
The core problem of the ID3 algorithm is to select the properties to be tested at each node of the tree, where we use the information gain of the attribute to measure the ability of the attribute to differentiate the training sample, and the greater the information gain of the attribute, the stronger the distinguishing ability. The ID3 algorithm uses information gain criteria to select attributes from candidate attributes at each step of the growth tree.
First, here is the concept of entropy, a metric widely used in information theory, called Entropy, which depicts the purity of any sample set.
Given a sample set that contains a positive and negative example of a target concept, the entropy of s relative to this Boolean classification is:
Entropy (S) ≡-p⊕log2p⊕-pθlog2pθ
Where P⊕ is the ratio of s to the normal case, pθ is the ratio of the inverse example in S. If all members of s belong to the same class, the entropy of spicy s is 0, when the number of positive and negative examples in the set is equal, the entropy is 1, and the other cases are between 0 and 1.
The above is about the entropy of the target classification to bool type, more generally, if the target attribute has a C different value, then the entropy of S classification with respect to the C state is defined as:
Where Pi is the ratio of Class I in S.
Using information gain to measure expected entropy reduction
With entropy as a standard for measuring the purity of training sample sets, it is now possible to define the measurement of the effectiveness of attribute classification data. This standard is known as information gain. Simply put, the information gain of an attribute is the decrease in the expected entropy resulting from the use of this attribute to split the sample. More precisely, the information gain gain (S,A) of an attribute a relative to the sample set S is defined as:
where values (a) is a collection of all possible values of property A, SV is a subset of the value of V for attribute a. The first term of the equation is the entropy of the original set S, and the second is the expected value of the S post entropy with a classification. The expected entropy of this second description is the weighted sum of the entropy of each subset, and the weight of the sample that belongs to the SV is the proportion of the original sample S, so gain (s,a) is reduced by the expected entropy due to knowing the value of attribute A.
Three uses Python to implement a simple decision tree generation
1. Shannon entropy for calculating data sets
1"""2Created on Sat 14 13:58:26 20163 4@author: MyHome5"""6"' Calculate Shannon entropy for a given dataset '7 8From MathImportLog9 Ten defCalcshannonent (DataSet): OneNumEntries = Len (dataSet) ALabelcounts = {} -For Featvec in DataSet: -CurrentLabel = Featvec[-1] theLabelcounts[currentlabel] = Labelcounts.get (currentlabel,0) + 1 -Shannonent = 0.0 -For key in Labelcounts: -Pro = float (Labelcounts[key])/numentries +shannonent =-pro * log (pro,2) - + returnShannonent
2. Create data
1 2 defCreateDataSet ():3 4 5DataSet = [' Sunny ', ' hot ', ' high ', ' Weak ', ' no '],[' Sunny ', ' hot ', ' high ', ' strong ', ' no ',6[' Overcast ', ' hot ', ' high ', ' Weak ', ' Yes '],[' Rain ', ' Mild ', ' High ', ' Weak ', ' Yes ',7[' Rain ', ' cool ', ' normal ', ' Weak ', ' Yes '],[' Rain ', ' cool ', ' normal ', ' strong ', ' No ',8[' Overcast ', ' Cool ', ' Normal ', ' strong ', ' Yes '],[' Sunny ', ' Mild ', ' High ', ' Weak ', ' No ',9[' Sunny ', ' Cool ', ' normal ', ' Weak ', ' Yes '],[' Rain ', ' Mild ', ' normal ', ' Weak ', ' Yes ',Ten[' Sunny ', ' Mild ', ' Normal ', ' strong ', ' Yes '],[' overcast ', ' Mild ', ' high ', ' strong ', ' Yes ', One[' Overcast ', ' hot ', ' Normal ', ' Weak ', ' Yes '],[' Rain ', ' Mild ', ' high ', ' strong ', ' No '] A -Labels = [' Outlook ', ' temperature ', ' humidity ', ' wind '] - returnDataset,labels
3. Partitioning a DataSet according to a given feature (partitioning a dataset based on the attribute values of a property)
1def Splitdataset (dataset,axis,value): 2 retdataset = [] 3 For Featvec in DataSet: 4 if featvec[axis] = = value: 5 Reducedfeatvec = Featvec[:axis] 6 reducedfeatvec.extend (Featvec[axis + 1:]) 7 retdataset.append (Reducedfeatvec) 8 9 return Retdataset Ten
4. Calculate the information gain for each attribute in the data set, and select the current best classification attribute
1 defChoosebestfeaturetosplit (DataSet):2Numfeatures = Len (dataset[0])-13Baseentropy = Calcshannonent (DataSet)4Bestinfogain = 0.05Bestfeature =-16For I in Range (numfeatures):7Featlist = [Example[i] For example in DataSet]8Uniquevals = Set (Featlist)9Newentropy = 0.0TenFor value in Uniquevals: OneSubdataset = Splitdataset (dataset,i,value) AProb = Len (subdataset)/float (len (dataSet)) -Newentropy + = Prob *calcshannonent (subdataset) -Infogain = Baseentropy-newentropy the if(Infogain >bestinfogain): -Bestinfogain = Infogain -Bestfeature = i - returnBestfeature +
5. If the dataset has processed all the attributes, but the class label is still not unique, we need to decide how to define the leaf node, in which case we usually use the majority voting method to determine the leaf node classification
1def majoritycnt (classlist): 2 classcount = {} 3for vote In Classlist: 4 if vote not in Classcount.keys (): 5 classcount[ Vote] = 0 6 classcount[vote] + = 1 7 8 operator . Itemgetter (1), reverse = True) 9 return sortedclasscount[0][0] One
6. Construction Tree
1 defCreatetree (dataset,labels):2Classlist = [Example[-1] For example in DataSet]3 ifClasslist.Count(classlist[0]) = = Len (classlist):4 returnCLASSLIST[0]5 ifLen (dataset[0]) = = 1:6 returnMAJORITYCNT (classlist)7Bestfeat = Choosebestfeaturetosplit (DataSet)8Bestfeatlabel = Labels[bestfeat]9Mytree = {bestfeatlabel:{}}TenDel (Labels[bestfeat]) OneFeatvalues = [Example[bestfeat] For example in DataSet] AUniquevals = Set (Featvalues) -For value in Uniquevals: -Sublabels = labels[:] theMytree[bestfeatlabel][value] = Createtree (Splitdataset (dataset,bestfeat,value), sublabels) - returnMytree
7. Running Results
1 dataset,labels = CreateDataSet () 2 3 createtree (dataset,labels) 4 out[10]: 5 {' Outlook ': {' overcast ': ' Yes ', 6' Rain ': {' Wind ': {' strong ': ' No ', ' Weak ': ' Yes '}, 7 ' Sunny ': {' humidity ': {' high ': ' No ', ' Normal ': ' Yes '}}} 8
We can draw a decision tree based on the results.
Iv. Summary
We divide the dataset by constantly selecting the current best properties until all the attributes are traversed or all the samples under each branch are of the same class, which is a continuous recursive spanning tree process.
The basic ID3 algorithm of decision tree