Fp-growth algorithm can find frequent itemsets efficiently, but the algorithm can't find association rules, and the fp-growth algorithm only needs to scan the database two times, in general, its algorithm efficiency is higher than the Apriori algorithm two orders of magnitude.
A FP tree is shown as 1:
It's no different from other trees, it just adds links to similar nodes.
Definition of FP tree:
Class TreeNode: def __init__ (self,namevalue,numoccur,parentnode): self.name = namevalue Self.count = Numoccur self.nodelink = None self.parent = parentnode Self.children = {} def Inc (Self,numoccur): Self.count + = Numoccur def disp (self,ind = 1): print "*ind,self.name," , Self.count for child In Self.children.values (): child.disp (ind+1)
The disp () function mainly displays the structure of the tree in text. In the implementation, we need a head pointer table to point to the first instance of a given type, such as 2:
The core part of this algorithm is the establishment of the FP tree, the following is the building code:
def createtree (DataSet, Minsup=1): #create fp-tree from DataSet but don ' t mine headertable = {} #go over DataSet Twi Ce for trans in DataSet: #first pass counts frequency of occurance for item in Trans:headertable[item ] = Headertable.get (item, 0) + Dataset[trans] for K in Headertable.keys (): #remove Items not meeting Minsup if Headertable[k] < Minsup:del (Headertable[k]) Freqitemset = set (Headertable.keys ()) #print ' Freqitemse T: ', freqitemset if Len (freqitemset) = = 0:return None, none #if no items meet min support-->get out for K in H EADERTABLE:HEADERTABLE[K] = [Headertable[k], None] #reformat headertable to use Node link #print ' headertable: ', headertable rettree = TreeNode (' Null Set ', 1, None) #create tree for Transet, Count in Dataset.items (): #go thro Ugh DataSet 2nd time Locald = {} for item in Transet: #put transaction items in order if item in Freqitemset: Locald[item] = headertable[item][0] If len (Locald) > 0:ordereditems = [v[0] for V in Sorted (Loc Ald.items (), Key=lambda p:p[1], reverse=true)] Updatetree (Ordereditems, Rettree, Headertable, Count) #populate Tree with ordered Freq itemset return rettree, headertable #return tree and header tabledef Updatetree (items, Intree, h Eadertable, Count): if Items[0] in Intree.children: #check if ordereditems[0] in Rettree.children intree.children [Items[0]].inc (count) #incrament count else: #add items[0] to Intree.children intree.children[items[0]] = Tree Node (Items[0], count, Intree) if headertable[items[0]][1] = = None: #update Header table Headertable[ite MS[0]][1] = Intree.children[items[0]] Else:updateheader (headertable[items[0]][1], Intree.children[items [0]]) If Len (items) > 1: #call updatetree () with remaining ordered items Updatetree (items[1::], Intree.childre N[items[0]], headertable, count) def updateheader (Nodetotest, TargetNode): #this version does not use recursion while (Nodetotest.node Link! = None): #Do not use recursion to traverse a linked list! Nodetotest = Nodetotest.nodelink Nodetotest.nodelink = TargetNode
Refer to Figure 2 we can clearly understand the build process: Headertable is the head of the pointer table, maintenance of the table is to find that the frequent items are used to prepare. Here are some implementation details that we have for all the elements of the advanced line count, if the minimum support is not met, the direct deletion is not added to the FP tree. The Updatetree () function updates the tree, and Updateheader () is a table that maintains the headertable header pointer.
Mining frequent itemsets from an FP tree:
Class Treenode:def __init__ (self,namevalue,numoccur,parentnode): Self.name = namevalue Self.count = num occur self.nodelink = None Self.parent = parentnode Self.children = {} def inc (Self,numoccur): Self.count + = Numoccur def disp (self,ind = 1): print "*ind,self.name,", Self.count for C Hild in Self.children.values (): Child.disp (ind+1) def createtree (dataset,minsup=1): headertable = {} for T Rans in Dataset:for item in Trans:headertable[item] = Headertable.get (item,0) + Dataset[trans] for K In Headertable.keys (): If HEADERTABLE[K] < Minsup:del (Headertable[k]) Freqitemset = Set (header Table.keys ()) If Len (freqitemset) = = 0:return None,none for k in headertable:headertable[k] = [headertable [K],none] Rettree = TreeNode (' Null Set ', 1,none) for Transet, Count of Dataset.items (): Locald = {} for Item in Transet: If item in Freqitemset:locald[item] = headertable[item][0] If len (Locald) > 0:o Rdereditems = [V[0] for V in Sorted (Locald.items (), key = lambda P:p[1],reverse = True)] Updatetree (ordereditems , Rettree,headertable,count) return rettree,headertabledef updatetree (items,intree,headertable,count): if Items[0] in Intree.children:intree.children[items[0]].inc (count) else:intree.children[items[0]] = TreeNode (items[0 ],count,intree) if headertable[items[0]][1] ==none:headertable[items[0]][1] = intree.children[items[0]] Else:updateheader (Headertable[items[0]][1], intree.children[items[0]) if Len (items) > 1: Updatetree (items[1::], intree.children[items[0], Headertable, count) def updateheader (Nodetotest,targetnode): while (nodetotest.nodelink! = None): nodetotest = Nodetotest.nodelink Nodetotest.nodelink = TargetNode def l Oadsimpdat (): Simpdat = [[' R ', ' Z ', ' h ', ' j ', ' P '], [' Z ', ' y ', ' x ', ' w ', ' V ', ' u ', ' t ', ' s '], [' Z '], [' R ', ' X ', ' n ', ' o ', ' s '], [' Y ', ' r ', ' x ', ' z ', ' q ', ' t ', ' P '], [' Y ', ' z ', ' x ', ' e ', ' Q ', ' s ', ' t ', ' m ']] return simpdatdef createinitset (dataSet): Retdict = {} for trans in Dataset:retdict[frozenset (TRANS)] = 1 return retdictdef ascendtree (leafnode,prefixpath): If Leafnode.parent!=none:prefixpath.append (LeafNode. Name) Ascendtree (Leafnode.parent, Prefixpath) def findprefixpath (Basepat,treenode): condpats={} while TreeNode ! = None:prefixpath = [] Ascendtree (TreeNode, Prefixpath) If Len (Prefixpath) > 1:cond Pats[frozenset (prefixpath[1:])] = Treenode.count TreeNode = treenode.nodelink return condpatsdef minetree (inTree , headertable,minsup,prefix,freqitemlist): Bigl = [v[0] for V in Sorted (Headertable.items (), Key=lambda p:p[1]) "for b Asepat in Bigl: Newfreqset = Prefix.copy () newfreqset.add (Basepat) freqitemlist.append (newfreqset) condpattba SES = Findprefixpath (Basepat, headertable[basepat][1]) Mycondtree,myhead = Createtree (condpattbases, MinSup) If myhead!= none:print ' conditional tree for: ', Newfreqset mycondtree.disp (1) minetre E (Mycondtree, Myhead, Minsup, Newfreqset, freqitemlist) if __name__ = = "__main__": Simpdat = Loadsimpdat () print S Impdat Initset = Createinitset (simpdat) print Initset myfptree, Myheadertab = Createtree (Initset, 3) myFPtree . DISP () print findprefixpath (' t ', myheadertab[' t '][1]) Freqitems = [] Minetree (Myfptree, Myheadertab, 3, set ([]), Freqitems) Print Freqitems
Output Result:
[[' R ', ' Z ', ' h ', ' j ', ' P '], [' Z ', ' y ', ' x ', ' w ', ' V ', ' u ', ' t ', ' s '], [' Z '], [' R ', ' X ', ' n ', ' o ', ' s '], [' Y ', ' r ', ' x ', ' Z ' ', ' Q ', ' t ', ' P '], [' Y ', ' z ', ' x ', ' e ', ' Q ', ' s ', ' t ', ' m ']]{frozenset ([' E ', ' m ', ' Q ', ' s ', ' t ', ' y ', ' x ', ' Z ']): 1, froze Nset ([' X ', ' s ', ' r ', ' O ', ' n ']): 1, Frozenset ([' s ', ' u ', ' t ', ' w ', ' V ', ' y ', ' x ', ' Z ']): 1, Frozenset ([' Q ', ' P ', ' r ', ' t ', ' Y ', ' x ', ' Z ']): 1, Frozenset ([' H ', ' R ', ' Z ', ' P ', ' J ']): 1, Frozenset ([' Z ']): 1} Null Set 1 x 1 s 1 r 1 z 5 x 3 y 3 s 2 T 2 R 1 T 1 r 1{frozenset ([' Y ', ' x ', ' s ', ' Z ']): 2, Frozenset ([' Y ', ' x ', ' R ', ' Z ']): 1}conditional tree For:set ([' Y ']) Null set 1 x 3 Z 3conditiona L Tree For:set ([' Y ', ' z ']) null set 1 x 3conditional tree For:set ([' s ']) null set 1 x 3conditional t Ree For:set ([' t ']) null set 1 y 3 x 3 z 3conditional tree For:set ([' x ', ' t ']) null set 1 y 3conditional Tree foR:set ([' Z ', ' t ']) null set 1 y 3 x 3conditional tree For:set ([' x ', ' z ', ' t ']) null set 1 y 3co Nditional Tree For:set ([' X ']) Null set 1 z 3[set ([' Y ']), set ([' Y ', ' x ']), set ([' Y ', ' z ']), set ([' Y ', ' x ', ' Z ']) , set ([' s ']), set ([' X ', ' s ']), set ([' t ']), set ([' Y ', ' t ']), set ([' X ', ' t ']), set ([' Y ', ' x ', ' t ']), set ([' Z ', ' t ']), set ([' Y ', ' z ', ' t ']), set ([' X ', ' z ', ' t ']), set ([' Y ', ' x ', ' z ', ' t ']), set ([' R ']), set ([' X ']), set ([' X ', ' Z ']), set ([' Z '])]
FP Tree and Python implementation