The fpgrowth of frequent itemsets mining algorithm

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

background:Frequent itemsets mining algorithms are used to mine frequently occurring item collections (called Frequent itemsets), by digging out these frequent itemsets, and when one of the items in a transaction has a frequent itemsets, you can use the other item of that frequent item set as Recommended。 For example, the classic shopping basket analysis of beer, diaper story, beer and diapers often appear in the user's shopping basket, by digging out beer, diapers, beer item set, when a user bought a beer can recommend diapers for him, so that users will be more likely to purchase, so as to achieve the purpose of portfolio marketing. There are two kinds of common itemsets mining algorithms, one is Apriori algorithm and the other is fpgrowth. Apriori through the continuous construction candidate set, filter candidate set mining frequent itemsets, need to scan the original data many times, when the original data is large, disk I/O too many times, inefficient. The fpgrowth algorithm simply scans the original data two times and compresses the raw data through the Fp-tree data structure, which is more efficient.
Fpgrowth algorithm is mainly divided into two steps: fp-tree construction, recursive mining fp-tree. Fp-tree is built with two data scans to compress the transactions in the raw data into a fp-tree tree, which is similar to the prefix tree, and the same prefix path can be shared to achieve the purpose of compressing the data. Then through Fp-tree find the conditional pattern base, condition Fp-tree, recursive mining condition Fp-tree get all the frequent itemsets. The main computational bottleneck of the algorithm is the recursive mining of Fp-tree, and the main steps of the fpgrowth algorithm are described in detail below.

fpgrowth algorithm steps:Fp-tree Construction
Scan the data for the first time, find out the frequent 1 items set L, sort the second pass in descending order scan data: For each transaction, filter infrequently set, the remaining frequent itemsets are sorted in L order transaction the frequent 1 itemsets of each of them into fp-tree, the same prefix path can be shared Add a Header table at the same time, connect the same item in Fp-tree, and also descending sort ==> Frequent item mining starts with the bottom item of the Header table and constructs the conditional pattern base for each item (conditional Pattern base) follows the list of item in the Header table and finds all the prefix paths that contain the item, which is the conditional pattern base (CPB) of the item, and the frequency (count) of all these CPB is the frequency (count) of the item on that path. If one of the paths containing P is fcamp, the frequency of P in this path is 2, then the Fcam of the CPB is 2
The construction condition Fp-tree (conditional fp-tree) accumulates the frequency of the item on each CPB (count), filters the item below the threshold, constructs the cpb{<fca:2&gt of fp-tree such as M;, <fcab:1>} , F:3, C:3, A:3, b:1, threshold is assumed to be 3, filter out B
fp-growh : Recursively mining each condition fp-tree, accumulate suffix frequent itemsets, Until Fp-tree is found to be empty or fp-tree has only one path (only one path, the combination of item on all paths is a frequent itemsets)

Note: fp-tree Header Table sorted by item descending causes a common prefix: No sorting causes the prefix to not be shared
More common prefixes: Frequent item will be more shared on the upper level of the tree, and ascending order will cause those occurrences of the item to appear in the branches of the tree, and cannot be more common prefixes

References: Mining frequent Patterns without candidate generation:a frequent-pattern Tree approach∗, PPT mahout parallelization FPGR Owth Implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The fpgrowth of frequent itemsets mining algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The fpgrowth of frequent itemsets mining algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support