FP-growth algorithm discovering frequent item sets (2) -- discovering frequent item sets and fp-growth algorithm

Last Update:2017-09-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes how to build a FP tree. Each path in the FP tree meets the minimum support level. What we need to do is to find more associations on one path.

Extraction Condition mode Base

Start with a single frequent element in the FP tree header. For each element item, obtain its corresponding conditional pattern base, which is the keyword of the element item. The condition mode is based on the path set ending with the searched element. Each path is actually a prefix path (perfix path ). In short, a prefix path is all content between the element to be searched and the root node.

It is the prefix path of the element item {s: 2} or {r: 1:

The condition mode base of {s}, that is, the prefix path set has two: {z, x, y, t}, {x }}; the {r} condition mode has three bases: {z}, {z, x, y, t}, {x, s }}.

The process of searching for the condition mode base is actually a process from every leaf node in the FP tree to the root node. We can use the headTable in the header pointer list to quickly access all the root nodes through the connection of pointers. The following table lists all the condition mode bases of the FP tree:

Create a condition FP tree

To discover more frequent item sets, you must create a conditional FP tree for each frequent item. You can use the condition pattern base you just found as the input data and build these trees using the same build code. Then, it recursively discovers frequent items, the Discovery condition pattern base, and other condition trees.

Take frequent item r as an example to construct a condition FP tree for r. The three prefix paths of r are {z}, {z, x, y, t}, {x, s} respectively. If the minimum support is minSupport = 2, then y, t, s is filtered out, and {z}, {z, x}, {x} are left }. Although y, s, and t are part of the condition mode base, they are not part of the condition FP tree, that is, they are not frequent for r. As shown in, the global support of y → t → r and s → r is 1, so y, t, s are not frequent for the r condition tree.

The filtered r condition tree is as follows:

Repeat the preceding steps. The r condition mode is based on {z, x}, and {x}. There is no path that can meet the minimum support. Therefore, the r condition tree has only one. Note that, although {z, x}, {x} contains two x, in {z, x}, z is the parent node of x, when constructing a condition FP tree, the parent node cannot be removed directly, but can be removed step by step from the child node. If it is {x, z}, a conditional FP tree with only {x} nodes can be constructed in this round. This is exactly what will be discussed in the previous article. The order of items will affect the final result.

The Code is as follows:

1 def ascendTree (leafNode, prefixPath): 2 if leafNode. parent! = None: 3 prefixPath. append (leafNode. name) 4 ascendTree (leafNode. parent, prefixPath) 5 6 def findPrefixPath (basePat, headTable): 7 condPats ={} 8 treeNode = headTable [basePat] [1] 9 while treeNode! = None: 10 prefixPath = [] 11 ascendTree (treeNode, prefixPath) 12 if len (prefixPath)> 1:13 condPats [frozenset (prefixPath [1:])] = treeNode. count14 treeNode = treeNode. nodeLink15 return condPats16 17 def mineTree (inTree, headerTable, minSup = 1, preFix = set ([]), freqItemList = []): 18 # order by minSup asc, value asc19 bigL = [v [0] for v in sorted (headerTable. items (), key = lambda p: (p [1] [0], p [0])] 20 for bas EPat in bigL: 21 newFreqSet = preFix. copy () 22 newFreqSet. add (basePat) 23 freqItemList. append (newFreqSet) 24 #25 condPattBases = findPrefixPath (basePat, headerTable) 26 myCondTree, myHead = createTree (condPattBases, minSup) 27 if myHead! = None: 28 print ('condpattbases: ', basePat, condPattBases) 29 myCondTree. disp () 30 print ('* 30) 31 32 mineTree (myCondTree, myHead, minSup, newFreqSet, freqItemList) 33 34 simpDat = loadSimpDat () 35 dictDat = createInitSet (simpDat) 36 myFPTree, myheader = createTree (dictDat, 3) 37 myFPTree. disp () 38 condPats = findPrefixPath ('Z', myheader) 39 print ('Z', condPats) 40 condPats = findPrefixPath ('x', myheader) 41 print ('x', condPats) 42 condPats = findPrefixPath ('y', myheader) 43 print ('y', condPats) 44 condPats = findPrefixPath ('T', myheader) 45 print ('T', condPats) 46 condPats = findPrefixPath ('s', myheader) 47 print ('s', condPats) 48 condPats = findPrefixPath ('R', myheader) 49 print ('R', condPats) 50 51 mineTree (myFPTree, myheader, 2)

Console information:

In this example, we can find two frequent item sets: {z, x} and {x }.

After a frequent item set is obtained, you can find Association Rules Based on the confidence level. This step is simple. Refer to the relevant content in the previous article.

References: machine learning practices

Author: I am 8-digit

Source: http://www.cnblogs.com/bigmonkey

This article focuses on learning, research, and sharing. If you need to reprint it, please contact me to indicate the author and source for non-commercial purposes!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

FP-growth algorithm discovering frequent item sets (2) -- discovering frequent item sets and fp-growth algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

FP-growth algorithm discovering frequent item sets (2) -- discovering frequent item sets and fp-growth algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support