Preface:
For an introduction to the FP-GROWTH algorithm, see: Introduction to the FP-GROWTH algorithm.
This paper mainly introduces the algorithm of extracting frequent itemsets from Fp-tree. See the above article for pseudo-code .
The structure of the fp-tree is shown in the structure of the fp-growth algorithm Fp-tree (python).
Text:
TRee_mINeR.Py File:
#coding =utf-8ImportTree_builderImportCopy class tree_miner(object): "" " Tree_miner class. Function: Mining "" "of the tree with frequent itemsets" "" def __init__(self, tree=none, min_sup=-1, headertable={}): Initialization of the "" " Tree_miner. The Tree is a constructed fp_tree, Min_sup is the minimum support count, Headertable is Fp_tree's head node table "" "Self.min_sup = Min_sup self.tree_mining (tree=tree, headertable=headertable) def tree_mining(self, tree, a=[], headertable={}): "" " function: recursive implementation of tree trees frequent itemsets mining. A equivalent to α,b in pseudo-code equivalent to beta " " "B = [] Allelem = {}#用来保存单个路径情况时, all nodes on the pathnode = Tree.root#node取得树的根节点 whileLen (Node.children) >0:#推断是否是单个路径 ifLen (node.children)! =1:#假设路径上的某个节点的孩子数不止一个. Then it is not a single path Breaknode = node.children.values () [0]#node取得下一个节点Allelem.setdefault (Node.data,node.count)#记录路径上的节点. If it is a single path, it will be used ifLen (Node.children) <1:#Tree仅仅包括单个路径L = Self.getl (Items=allelem, Min_sup=self.min_sup, A=a)#L即为我们要求的频繁项集Self.showresult (L)#对结果进行输出 return Else: forIteminchHeadertable:#对于头结点表中的元素, find the frequent itemsets at its end, one by one ifA:#产生项目集B forEleminchA:ifElem! = []: temp = copy.copy (elem) b.append (temp) B.append ([item]+temp)Else: B.append ([item]) pattem,counts = Self.findpattembase (item, headertable)#得到以项item结尾的所以条件模式基, counts the count of stored conditional mode basesmyheadertable = {} Conditiontree_builder = Tree_builder. Tree_builder (Routines=pattem, counts=counts, headertable=myheadertable)#新建一个Tree_builder对象, use it to construct conditions Fp-tree ifConditionTree_builder.tree.root.children:#假设构造的条件FP-The tree is not emptySelf.tree_mining (Tree=conditiontree_builder.tree, a=b, headertable=myheadertable)#递归调用B = []return def findpattembase(self, item, headertable): "" " function: Search the tree for item's conditional pattern base" "according to the tree's Head node tableItempattem = []#存放项item的全部模式基Counts = []#存放模式基的计数addresstable = Headertable[item]#头节点表中item链上所以节点的地址 forItemnodeinchAddresstable:#对头结点表表中存放的每一个item节点Iteminpattem = []#用来存放item模式基中的各项Nodeinpattem = Itemnode.parent#item模式基的项, use it to backtrack to the roots. is a pattern base ifNodeinpattem.data = =' null ':#假设父亲节点就是树根, you skip Continue whileNodeinpattem.data! =' null ':#假设还没到树根, it keeps backtracking.Iteminpattem.append (Nodeinpattem.data)#把它压进item的模式基Nodeinpattem = Nodeinpattem.parent#让当前节点跳到它的父亲节点, backtrackingIteminpattem = tuple (Iteminpattem) itempattem.append (Iteminpattem)#找完了一条item的模式基了Counts.append (Itemnode.count)#模式基的计数 returnItempattem,counts def showresult(self, result=[[]): "" function: To show the frequent itemsets to be mined "" " forEleminchResult:num = Elem.pop ()#频繁项集的计数 PrintTuple (Elem),': 'Numreturn def combiner(self, myList, n): "" " function: Arranges all the elements of the list list, generating n tuples grouped together " ""Answers = [] one = [0] * N def next_c(li = 0, ni = 0): ifNI = = n:answers.append (copy.copy (one))return forLjinchXrange (Li, Len (myList)): One[ni] = Mylist[lj] Next_c (LJ +1, Ni +1) Next_c ()returnAnswers def findminimum(self, items, elem): "" " function: Find the minimum value" "for each count in the Elem list according to the items dictionaryMinimum = items[elem[0]] forAinchElemifItems[a] < minimum:#假设某元素的计数更小, the Count of it is recordedMinimum = Items[a]returnMinimum def getl(self, items, min_sup=-1, a=[]): "" " function: Generate frequent Itemsets" "for a tree with only one pathTempresult = [] Finnalresult = [] nodes = Items.keys ()#取得items字典的键, which is all nodes on a single path forIinchRange1, Len (nodes) +1):#对nodes. That is, all nodes on the path generate various combinationsTempresult + = Self.combiner (Mylist=nodes, n=i) forEleminchtempresult[::-1]:#elem逆序对dearResult訪问, because the next element will be deleted, reverse the good operationElemminimum = self.findminimum (items, elem)#找出elem里面的最小计数 ifElemminimum < min_sup:#假设组合elem的最小计数小于最小支持度计数. is deleted.Tempresult.remove (Elem)Else:#否则把它压入结果列表中进行输出. But it is only a conditional pattern base, plus the last item to form the frequent itemsets, and at the same time the minimum count is added forAeleminchA:#A可能含有多项 ifAelem:temp = Elem Temp + = Aelem temp.append (elemmini Mum) Finnalresult.append (temp)#将挖掘出的频繁项集保存在finnalResult列表 returnFinnalresult
GenerationCodetoAddress: Fp-growth algorithm python implementation (full code).
Note: This code is written in a python2.7+eclipse environment. You can import the project in Eclipse, or you can run the "__init__.py" file with the Python command on the command line form.
Reprint please indicate the source, thank you!
(Original link: http://blog.csdn.net/bone_ace/article/details/46747791)
Mining of frequent itemsets for the fp-growth algorithm (python)