The fpgrowth of frequent itemsets mining algorithm

Source: Internet
Author: User
background:Frequent itemsets mining algorithms are used to mine frequently occurring item collections (called Frequent itemsets), by digging out these frequent itemsets, and when one of the items in a transaction has a frequent itemsets, you can use the other item of that frequent item set as Recommended。         For example, the classic shopping basket analysis of beer, diaper story, beer and diapers often appear in the user's shopping basket, by digging out beer, diapers, beer item set, when a user bought a beer can recommend diapers for him, so that users will be more likely to purchase, so as to achieve the purpose of portfolio marketing. There are two kinds of common itemsets mining algorithms, one is Apriori algorithm and the other is fpgrowth. Apriori through the continuous construction candidate set, filter candidate set mining frequent itemsets, need to scan the original data many times, when the original data is large, disk I/O too many times, inefficient. The fpgrowth algorithm simply scans the original data two times and compresses the raw data through the Fp-tree data structure, which is more efficient.
Fpgrowth algorithm is mainly divided into two steps: fp-tree construction, recursive mining fp-tree. Fp-tree is built with two data scans to compress the transactions in the raw data into a fp-tree tree, which is similar to the prefix tree, and the same prefix path can be shared to achieve the purpose of compressing the data. Then through Fp-tree find the conditional pattern base, condition Fp-tree, recursive mining condition Fp-tree get all the frequent itemsets. The main computational bottleneck of the algorithm is the recursive mining of Fp-tree, and the main steps of the fpgrowth algorithm are described in detail below.

fpgrowth algorithm steps:Fp-tree Construction
Scan the data for the first time, find out the frequent 1 items set L, sort the second pass in descending order scan data: For each transaction, filter infrequently set, the remaining frequent itemsets are sorted in L order transaction the frequent 1 itemsets of each of them into fp-tree, the same prefix path can be shared Add a Header table at the same time, connect the same item in Fp-tree, and also descending sort ==> Frequent item mining starts with the bottom item of the Header table and constructs the conditional pattern base for each item (conditional Pattern base) follows the list of item in the Header table and finds all the prefix paths that contain the item, which is the conditional pattern base (CPB) of the item, and the frequency (count) of all these CPB is the frequency (count) of the item on that path. If one of the paths containing P is fcamp, the frequency of P in this path is 2, then the Fcam of the CPB is 2
The construction condition Fp-tree (conditional fp-tree) accumulates the frequency of the item on each CPB (count), filters the item below the threshold, constructs the cpb{<fca:2&gt of fp-tree such as M;, <fcab:1>} , F:3, C:3, A:3, b:1,  threshold is assumed to be 3, filter out B
fp-growh : Recursively mining each condition fp-tree, accumulate suffix frequent itemsets, Until Fp-tree is found to be empty or fp-tree has only one path (only one path, the combination of item on all paths is a frequent itemsets)

Note: fp-tree Header Table sorted by item descending causes a common prefix: No sorting causes the prefix to not be shared
More common prefixes: Frequent item will be more shared on the upper level of the tree, and ascending order will cause those occurrences of the item to appear in the branches of the tree, and cannot be more common prefixes


References: Mining frequent Patterns without candidate generation:a frequent-pattern Tree approach∗, PPT mahout parallelization FPGR Owth Implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.