FP-growth algorithm discovering frequent item sets (2) -- discovering frequent item sets and fp-growth algorithm
This article describes how to build a FP tree. Each path in the FP tree meets the minimum support level. What we need to do is to find more associations on one path.
Extraction Condition mode Base
Start with a single frequent element in the FP tree header. For each element item, obtain its corresponding conditional pattern base, which is the keyword of the element item. The condition mode is based on the path set ending with the searched element. Each path is actually a prefix path (perfix path ). In short, a prefix path is all content between the element to be searched and the root node.
It is the prefix path of the element item {s: 2} or {r: 1:
The condition mode base of {s}, that is, the prefix path set has two: {z, x, y, t}, {x }}; the {r} condition mode has three bases: {z}, {z, x, y, t}, {x, s }}.
The process of searching for the condition mode base is actually a process from every leaf node in the FP tree to the root node. We can use the headTable in the header pointer list to quickly access all the root nodes through the connection of pointers. The following table lists all the condition mode bases of the FP tree:
Create a condition FP tree
To discover more frequent item sets, you must create a conditional FP tree for each frequent item. You can use the condition pattern base you just found as the input data and build these trees using the same build code. Then, it recursively discovers frequent items, the Discovery condition pattern base, and other condition trees.
Take frequent item r as an example to construct a condition FP tree for r. The three prefix paths of r are {z}, {z, x, y, t}, {x, s} respectively. If the minimum support is minSupport = 2, then y, t, s is filtered out, and {z}, {z, x}, {x} are left }. Although y, s, and t are part of the condition mode base, they are not part of the condition FP tree, that is, they are not frequent for r. As shown in, the global support of y → t → r and s → r is 1, so y, t, s are not frequent for the r condition tree.
The filtered r condition tree is as follows:
Repeat the preceding steps. The r condition mode is based on {z, x}, and {x}. There is no path that can meet the minimum support. Therefore, the r condition tree has only one. Note that, although {z, x}, {x} contains two x, in {z, x}, z is the parent node of x, when constructing a condition FP tree, the parent node cannot be removed directly, but can be removed step by step from the child node. If it is {x, z}, a conditional FP tree with only {x} nodes can be constructed in this round. This is exactly what will be discussed in the previous article. The order of items will affect the final result.
The Code is as follows:
1 def ascendTree (leafNode, prefixPath): 2 if leafNode. parent! = None: 3 prefixPath. append (leafNode. name) 4 ascendTree (leafNode. parent, prefixPath) 5 6 def findPrefixPath (basePat, headTable): 7 condPats ={} 8 treeNode = headTable [basePat] [1] 9 while treeNode! = None: 10 prefixPath = [] 11 ascendTree (treeNode, prefixPath) 12 if len (prefixPath)> 1:13 condPats [frozenset (prefixPath [1:])] = treeNode. count14 treeNode = treeNode. nodeLink15 return condPats16 17 def mineTree (inTree, headerTable, minSup = 1, preFix = set ([]), freqItemList = []): 18 # order by minSup asc, value asc19 bigL = [v [0] for v in sorted (headerTable. items (), key = lambda p: (p [1] [0], p [0])] 20 for bas EPat in bigL: 21 newFreqSet = preFix. copy () 22 newFreqSet. add (basePat) 23 freqItemList. append (newFreqSet) 24 #25 condPattBases = findPrefixPath (basePat, headerTable) 26 myCondTree, myHead = createTree (condPattBases, minSup) 27 if myHead! = None: 28 print ('condpattbases: ', basePat, condPattBases) 29 myCondTree. disp () 30 print ('* 30) 31 32 mineTree (myCondTree, myHead, minSup, newFreqSet, freqItemList) 33 34 simpDat = loadSimpDat () 35 dictDat = createInitSet (simpDat) 36 myFPTree, myheader = createTree (dictDat, 3) 37 myFPTree. disp () 38 condPats = findPrefixPath ('Z', myheader) 39 print ('Z', condPats) 40 condPats = findPrefixPath ('x', myheader) 41 print ('x', condPats) 42 condPats = findPrefixPath ('y', myheader) 43 print ('y', condPats) 44 condPats = findPrefixPath ('T', myheader) 45 print ('T', condPats) 46 condPats = findPrefixPath ('s', myheader) 47 print ('s', condPats) 48 condPats = findPrefixPath ('R', myheader) 49 print ('R', condPats) 50 51 mineTree (myFPTree, myheader, 2)
Console information:
In this example, we can find two frequent item sets: {z, x} and {x }.
After a frequent item set is obtained, you can find Association Rules Based on the confidence level. This step is simple. Refer to the relevant content in the previous article.
References: machine learning practices
Author: I am 8-digit
Source: http://www.cnblogs.com/bigmonkey
This article focuses on learning, research, and sharing. If you need to reprint it, please contact me to indicate the author and source for non-commercial purposes!