FP Tree and Python implementation

Source: Internet
Author: User

Fp-growth algorithm can find frequent itemsets efficiently, but the algorithm can't find association rules, and the fp-growth algorithm only needs to scan the database two times, in general, its algorithm efficiency is higher than the Apriori algorithm two orders of magnitude.

A FP tree is shown as 1:



It's no different from other trees, it just adds links to similar nodes.

Definition of FP tree:

Class TreeNode:    def __init__ (self,namevalue,numoccur,parentnode):        self.name = namevalue        Self.count = Numoccur        self.nodelink = None        self.parent = parentnode        Self.children = {}    def Inc (Self,numoccur):        Self.count + = Numoccur            def disp (self,ind = 1):        print "*ind,self.name,"  , Self.count        for child In Self.children.values ():            child.disp (ind+1)

The disp () function mainly displays the structure of the tree in text. In the implementation, we need a head pointer table to point to the first instance of a given type, such as 2:



The core part of this algorithm is the establishment of the FP tree, the following is the building code:

def createtree (DataSet, Minsup=1): #create fp-tree from DataSet but don ' t mine headertable = {} #go over DataSet Twi Ce for trans in DataSet: #first pass counts frequency of occurance for item in Trans:headertable[item ] = Headertable.get (item, 0) + Dataset[trans] for K in Headertable.keys (): #remove Items not meeting Minsup if Headertable[k] < Minsup:del (Headertable[k]) Freqitemset = set (Headertable.keys ()) #print ' Freqitemse T: ', freqitemset if Len (freqitemset) = = 0:return None, none #if no items meet min support-->get out for K in H  EADERTABLE:HEADERTABLE[K] = [Headertable[k], None] #reformat headertable to use Node link #print ' headertable: ', headertable rettree = TreeNode (' Null Set ', 1, None) #create tree for Transet, Count in Dataset.items (): #go thro  Ugh DataSet 2nd time Locald = {} for item in Transet: #put transaction items in order if item in           Freqitemset:     Locald[item] = headertable[item][0] If len (Locald) > 0:ordereditems = [v[0] for V in Sorted (Loc Ald.items (), Key=lambda p:p[1], reverse=true)] Updatetree (Ordereditems, Rettree, Headertable, Count) #populate Tree with ordered Freq itemset return rettree, headertable #return tree and header tabledef Updatetree (items, Intree, h Eadertable, Count): if Items[0] in Intree.children: #check if ordereditems[0] in Rettree.children intree.children [Items[0]].inc (count) #incrament count else: #add items[0] to Intree.children intree.children[items[0]] = Tree Node (Items[0], count, Intree) if headertable[items[0]][1] = = None: #update Header table Headertable[ite MS[0]][1] = Intree.children[items[0]] Else:updateheader (headertable[items[0]][1], Intree.children[items [0]]) If Len (items) > 1: #call updatetree () with remaining ordered items Updatetree (items[1::], Intree.childre N[items[0]], headertable, count) def updateheader (Nodetotest, TargetNode): #this version does not use recursion while (Nodetotest.node        Link! = None): #Do not use recursion to traverse a linked list! Nodetotest = Nodetotest.nodelink Nodetotest.nodelink = TargetNode
Refer to Figure 2 we can clearly understand the build process: Headertable is the head of the pointer table, maintenance of the table is to find that the frequent items are used to prepare. Here are some implementation details that we have for all the elements of the advanced line count, if the minimum support is not met, the direct deletion is not added to the FP tree. The Updatetree () function updates the tree, and Updateheader () is a table that maintains the headertable header pointer.

Mining frequent itemsets from an FP tree:

Class Treenode:def __init__ (self,namevalue,numoccur,parentnode): Self.name = namevalue Self.count = num        occur self.nodelink = None Self.parent = parentnode Self.children = {} def inc (Self,numoccur): Self.count + = Numoccur def disp (self,ind = 1): print "*ind,self.name,", Self.count for C Hild in Self.children.values (): Child.disp (ind+1) def createtree (dataset,minsup=1): headertable = {} for T  Rans in Dataset:for item in Trans:headertable[item] = Headertable.get (item,0) + Dataset[trans] for K In Headertable.keys (): If HEADERTABLE[K] < Minsup:del (Headertable[k]) Freqitemset = Set (header Table.keys ()) If Len (freqitemset) = = 0:return None,none for k in headertable:headertable[k] = [headertable  [K],none] Rettree = TreeNode (' Null Set ', 1,none) for Transet, Count of Dataset.items (): Locald = {} for       Item in Transet:     If item in Freqitemset:locald[item] = headertable[item][0] If len (Locald) > 0:o Rdereditems = [V[0] for V in Sorted (Locald.items (), key = lambda P:p[1],reverse = True)] Updatetree (ordereditems  , Rettree,headertable,count) return rettree,headertabledef updatetree (items,intree,headertable,count): if Items[0] in Intree.children:intree.children[items[0]].inc (count) else:intree.children[items[0]] = TreeNode (items[0         ],count,intree) if headertable[items[0]][1] ==none:headertable[items[0]][1] = intree.children[items[0]]        Else:updateheader (Headertable[items[0]][1], intree.children[items[0]) if Len (items) > 1:    Updatetree (items[1::], intree.children[items[0], Headertable, count) def updateheader (Nodetotest,targetnode): while (nodetotest.nodelink! = None): nodetotest = Nodetotest.nodelink Nodetotest.nodelink = TargetNode def l Oadsimpdat (): Simpdat = [[' R ', ' Z ', ' h ', ' j ', ' P '], [' Z ', ' y ', ' x ', ' w ', ' V ', ' u ', ' t ', ' s '], [' Z '],  [' R ', ' X ', ' n ', ' o ', ' s '], [' Y ', ' r ', ' x ', ' z ', ' q ', ' t ', ' P '], [' Y ', ' z ', ' x ', ' e ', ' Q ', ' s ', ' t ', ' m ']] return simpdatdef createinitset (dataSet): Retdict = {} for trans in Dataset:retdict[frozenset (TRANS)] = 1 return retdictdef ascendtree (leafnode,prefixpath): If Leafnode.parent!=none:prefixpath.append (LeafNode.  Name) Ascendtree (Leafnode.parent, Prefixpath) def findprefixpath (Basepat,treenode): condpats={} while TreeNode ! = None:prefixpath = [] Ascendtree (TreeNode, Prefixpath) If Len (Prefixpath) > 1:cond Pats[frozenset (prefixpath[1:])] = Treenode.count TreeNode = treenode.nodelink return condpatsdef minetree (inTree , headertable,minsup,prefix,freqitemlist): Bigl = [v[0] for V in Sorted (Headertable.items (), Key=lambda p:p[1]) "for b Asepat in Bigl:        Newfreqset = Prefix.copy () newfreqset.add (Basepat) freqitemlist.append (newfreqset) condpattba         SES = Findprefixpath (Basepat, headertable[basepat][1]) Mycondtree,myhead = Createtree (condpattbases, MinSup) If myhead!= none:print ' conditional tree for: ', Newfreqset mycondtree.disp (1) minetre E (Mycondtree, Myhead, Minsup, Newfreqset, freqitemlist) if __name__ = = "__main__": Simpdat = Loadsimpdat () print S Impdat Initset = Createinitset (simpdat) print Initset myfptree, Myheadertab = Createtree (Initset, 3) myFPtree  . DISP () print findprefixpath (' t ', myheadertab[' t '][1]) Freqitems = [] Minetree (Myfptree, Myheadertab, 3, set ([]),     Freqitems) Print Freqitems

Output Result:

[[' R ', ' Z ', ' h ', ' j ', ' P '], [' Z ', ' y ', ' x ', ' w ', ' V ', ' u ', ' t ', ' s '], [' Z '], [' R ', ' X ', ' n ', ' o ', ' s '], [' Y ', ' r ', ' x ', ' Z ' ', ' Q ', ' t ', ' P '], [' Y ', ' z ', ' x ', ' e ', ' Q ', ' s ', ' t ', ' m ']]{frozenset ([' E ', ' m ', ' Q ', ' s ', ' t ', ' y ', ' x ', ' Z ']): 1, froze  Nset ([' X ', ' s ', ' r ', ' O ', ' n ']): 1, Frozenset ([' s ', ' u ', ' t ', ' w ', ' V ', ' y ', ' x ', ' Z ']): 1, Frozenset ([' Q ', ' P ', ' r ', ' t ',     ' Y ', ' x ', ' Z ']): 1, Frozenset ([' H ', ' R ', ' Z ', ' P ', ' J ']): 1, Frozenset ([' Z ']): 1} Null Set 1 x 1 s 1 r 1 z 5 x 3 y 3 s 2 T 2 R 1 T 1 r 1{frozenset ([' Y ', ' x ', ' s ', ' Z ']): 2, Frozenset ([' Y ', ' x ', ' R ', ' Z ']): 1}conditional tree For:set ([' Y ']) Null set 1 x 3 Z 3conditiona L Tree For:set ([' Y ', ' z ']) null set 1 x 3conditional tree For:set ([' s ']) null set 1 x 3conditional t     Ree For:set ([' t ']) null set 1 y 3 x 3 z 3conditional tree For:set ([' x ', ' t ']) null set 1 y 3conditional Tree foR:set ([' Z ', ' t ']) null set 1 y 3 x 3conditional tree For:set ([' x ', ' z ', ' t ']) null set 1 y 3co Nditional Tree For:set ([' X ']) Null set 1 z 3[set ([' Y ']), set ([' Y ', ' x ']), set ([' Y ', ' z ']), set ([' Y ', ' x ', ' Z ']) , set ([' s ']), set ([' X ', ' s ']), set ([' t ']), set ([' Y ', ' t ']), set ([' X ', ' t ']), set ([' Y ', ' x ', ' t ']), set ([' Z ', ' t ']), set ([' Y ', ' z ', ' t ']), set ([' X ', ' z ', ' t ']), set ([' Y ', ' x ', ' z ', ' t ']), set ([' R ']), set ([' X ']), set ([' X ', ' Z ']), set ([' Z '])]



FP Tree and Python implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.