FP-growth algorithm discovering frequent item sets (2) -- discovering frequent item sets and fp-growth algorithm

Source: Internet
Author: User

FP-growth algorithm discovering frequent item sets (2) -- discovering frequent item sets and fp-growth algorithm

This article describes how to build a FP tree. Each path in the FP tree meets the minimum support level. What we need to do is to find more associations on one path.

Extraction Condition mode Base

Start with a single frequent element in the FP tree header. For each element item, obtain its corresponding conditional pattern base, which is the keyword of the element item. The condition mode is based on the path set ending with the searched element. Each path is actually a prefix path (perfix path ). In short, a prefix path is all content between the element to be searched and the root node.

It is the prefix path of the element item {s: 2} or {r: 1:

The condition mode base of {s}, that is, the prefix path set has two: {z, x, y, t}, {x }}; the {r} condition mode has three bases: {z}, {z, x, y, t}, {x, s }}.

The process of searching for the condition mode base is actually a process from every leaf node in the FP tree to the root node. We can use the headTable in the header pointer list to quickly access all the root nodes through the connection of pointers. The following table lists all the condition mode bases of the FP tree:

Create a condition FP tree

To discover more frequent item sets, you must create a conditional FP tree for each frequent item. You can use the condition pattern base you just found as the input data and build these trees using the same build code. Then, it recursively discovers frequent items, the Discovery condition pattern base, and other condition trees.

Take frequent item r as an example to construct a condition FP tree for r. The three prefix paths of r are {z}, {z, x, y, t}, {x, s} respectively. If the minimum support is minSupport = 2, then y, t, s is filtered out, and {z}, {z, x}, {x} are left }. Although y, s, and t are part of the condition mode base, they are not part of the condition FP tree, that is, they are not frequent for r. As shown in, the global support of y → t → r and s → r is 1, so y, t, s are not frequent for the r condition tree.

The filtered r condition tree is as follows:

Repeat the preceding steps. The r condition mode is based on {z, x}, and {x}. There is no path that can meet the minimum support. Therefore, the r condition tree has only one. Note that, although {z, x}, {x} contains two x, in {z, x}, z is the parent node of x, when constructing a condition FP tree, the parent node cannot be removed directly, but can be removed step by step from the child node. If it is {x, z}, a conditional FP tree with only {x} nodes can be constructed in this round. This is exactly what will be discussed in the previous article. The order of items will affect the final result.

The Code is as follows:

1 def ascendTree (leafNode, prefixPath): 2 if leafNode. parent! = None: 3 prefixPath. append (leafNode. name) 4 ascendTree (leafNode. parent, prefixPath) 5 6 def findPrefixPath (basePat, headTable): 7 condPats ={} 8 treeNode = headTable [basePat] [1] 9 while treeNode! = None: 10 prefixPath = [] 11 ascendTree (treeNode, prefixPath) 12 if len (prefixPath)> 1:13 condPats [frozenset (prefixPath [1:])] = treeNode. count14 treeNode = treeNode. nodeLink15 return condPats16 17 def mineTree (inTree, headerTable, minSup = 1, preFix = set ([]), freqItemList = []): 18 # order by minSup asc, value asc19 bigL = [v [0] for v in sorted (headerTable. items (), key = lambda p: (p [1] [0], p [0])] 20 for bas EPat in bigL: 21 newFreqSet = preFix. copy () 22 newFreqSet. add (basePat) 23 freqItemList. append (newFreqSet) 24 #25 condPattBases = findPrefixPath (basePat, headerTable) 26 myCondTree, myHead = createTree (condPattBases, minSup) 27 if myHead! = None: 28 print ('condpattbases: ', basePat, condPattBases) 29 myCondTree. disp () 30 print ('* 30) 31 32 mineTree (myCondTree, myHead, minSup, newFreqSet, freqItemList) 33 34 simpDat = loadSimpDat () 35 dictDat = createInitSet (simpDat) 36 myFPTree, myheader = createTree (dictDat, 3) 37 myFPTree. disp () 38 condPats = findPrefixPath ('Z', myheader) 39 print ('Z', condPats) 40 condPats = findPrefixPath ('x', myheader) 41 print ('x', condPats) 42 condPats = findPrefixPath ('y', myheader) 43 print ('y', condPats) 44 condPats = findPrefixPath ('T', myheader) 45 print ('T', condPats) 46 condPats = findPrefixPath ('s', myheader) 47 print ('s', condPats) 48 condPats = findPrefixPath ('R', myheader) 49 print ('R', condPats) 50 51 mineTree (myFPTree, myheader, 2)

Console information:

In this example, we can find two frequent item sets: {z, x} and {x }.

After a frequent item set is obtained, you can find Association Rules Based on the confidence level. This step is simple. Refer to the relevant content in the previous article.

 

 

References: machine learning practices

Author: I am 8-digit

Source: http://www.cnblogs.com/bigmonkey

This article focuses on learning, research, and sharing. If you need to reprint it, please contact me to indicate the author and source for non-commercial purposes!

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.