Collective Smart Programming-Decision Tree Modeling (bottom) and collective smart Modeling
1. display of decision trees:
We have obtained a decision tree. We may need to browse the data in the next step. The following function is a method for displaying decision trees in plain text. Although the output is not very beautiful, it is also a simple method for displaying trees with not many nodes.
def printtree(tree,indent=''): if tree.results!=None: print str(tree.results) else: print str(tree.col)+':'+str(tree.value)+'?' print indent+'T->' printtree(tree.tb,indent+' ') print indent+'F->' printtree(tree.fb,indent+' ')
This is also a recursive function that accepts the tree returned by buildtree as a parameter and recursively displays the tree.
The result of calling this function is as follows:
>>> treepredict.printtree(tr)0:google?T->3:21? T->{'Premium': 3} F->2:yes? T->{'Basic': 1} F->{'None': 1}F->0:slashdot? T->{'None': 3} F->2:yes? T->{'Basic': 4} F->3:21? T->{'Basic': 1} F->{'None': 3}
Display Mode:
For trees with not many nodes, text display is feasible. However, as the number of trees grows, it is more appropriate to display them in graphs.
To plot the tree, we also need to install Python Imageing Library. Download the library from www.pythonware.com.
After installation, add the following import statements to the treepredict. py file.
from PIL import Image,ImageDraw
Graphic Display function:
def getwidth(tree): if tree.tb==None and tree.fb==None:return 1 return getwidth(tree.tb)+getwidth(tree.fb)def getdepth(tree): if tree.tb==None and tree.fb==None:return 0 return max(getdepth(tree.tb),getdepth(tree.fb))+1def drawtree(tree,jpeg='tree.jpg'): w=getwidth(tree)*100 h=getdepth(tree)*100+120 img=Image.new('RGB',(w,h),(255,255,255)) draw=ImageDraw.Draw(img) drawnode(draw,tree,w/2,20) img.save(jpeg,'JPEG')def drawnode(draw,tree,x,y): if tree.results==None: # Get the width of each branch w1=getwidth(tree.fb)*100 w2=getwidth(tree.tb)*100 # Determine the total space required by this node left=x-(w1+w2)/2 right=x+(w1+w2)/2 # Draw the condition string draw.text((x-20,y-10),str(tree.col)+':'+str(tree.value),(0,0,0)) # Draw links to the branches draw.line((x,y,left+w1/2,y+100),fill=(255,0,0)) draw.line((x,y,right-w2/2,y+100),fill=(255,0,0)) # Draw the branch nodes drawnode(draw,tree.fb,left+w1/2,y+100) drawnode(draw,tree.tb,right-w2/2,y+100) else: txt=' \n'.join(['%s:%d'%v for v in tree.results.items()]) draw.text((x-20,y),txt,(0,0,0))
Now, we can try to draw the decision tree:
>>> reload(treepredict)<module 'treepredict' from 'treepredict.pyc'>>>> treepredict.drawtree(tr,jpeg='tree.jpg')
We will find a file named 'tree.jpg 'under the pythonfolder, as shown in:
This is the decision tree.
Ii. classify new observed data:
def classify(observation,tree): if tree.results!=None: #is not a leafnode return tree.results else: v=observation[tree.col] branch=None if isinstance(v,int) or isinstance(v,float): if v>=tree.value: branch=tree.tb else: branch=tree.fb else: if v==tree.value: branch=tree.tb else: branch=tree.fb return classify(observation,branch)
Now we can call the classify function to classify the new observed data:
>>> reload(treepredict)<module 'treepredict' from 'treepredict.pyc'>>>> treepredict.classify(['(direct)','USA','yes',5],tr){'Basic': 4}
So far, we have constructed a complete decision tree.
See collective Smart Programming