Mining Association rules of Data Mining Algorithm (II.) fpgrowth algorithm

Source: Internet
Author: User

The Apriori algorithm described earlier has many drawbacks, such as a large number of full-table scans and large computational natural connections, and is now almost no longer used

The PFP algorithm is used in the Mahout algorithm library, which is the distributed operation mode of Fpgrowth algorithm, and its internal algorithm structure and FPGROWTH algorithm difference are not very large

So here's the first introduction to the FPGROWTH algorithm that runs in stand-alone memory


Or use the Apriori algorithm's shopping cart data as an example, as shown in:


TID is the number of the shopping cart item, I1-I5 is the item number

The basic idea of the FPGROWTH algorithm is to scan the entire shopping cart data sheet first, calculate the support for each item, and sort from top to bottom, as shown in the table below.


Build the FP tree, starting with the minimum support level at the bottom

The build process is as follows:


Finally, the FP tree is built as


Associate this FP tree with a support scale such as:

Each item in the support table has a pointer to the corresponding node in the FP tree, for example, the first line points to I2:7, and the second line points to I1:4 because the I1 node also appears elsewhere in the FP tree, and a pointer to the I1:2 node is stored in the so-called I1:4 node

Building a good FP tree with a handful of full-table scans turns the cart's irregular data into a tree-like structure that can be traced, and eliminates the computational nature of a huge natural connection.



Mining Association rules from the FP tree:

Through the FP tree, we can get the corresponding conditional pattern base for each commodity, conditional FP tree and generated frequent pattern

such as i5

As you can see in the FP tree, there are two paths from the root node to the i5:1:

I2:7-->i1:4-->i5:1

I2:7-->i14-->i3:2-->i5:1

I2:7-->i1:4 and I2:7-->i14-->i3:2 are i5 conditional pattern bases, because the node that eventually arrives is definitely i5, so i5 is omitted

Remember {i2,i1:1}{i2,i1,i3:1}, why is the count of each conditional pattern base 1? Although the counts of I2 and I1 are large, the i5 count is 1, and the number of repetitions that eventually reach i5 is only 1. So the count of conditional pattern bases is determined by the minimum count of nodes in the path.

Depending on the conditional pattern base, we can get the conditional FP tree for that commodity, for example i5:


According to the conditions of the FP tree, we can do a full array of combinations, to get the frequent patterns excavated (here to the commodity itself, such as i5 also counted in, each commodity mining out of the frequent pattern must include the commodity itself)

The full table obtained according to the FP tree is as follows:


At this point, the result of the FPGROWTH algorithm output is the frequent pattern, the FPGROWTH algorithm uses the way of divide and conquer, and a potentially huge tree structure is constructed by constructing the conditional FP subtree, respectively processing

However, when the product data is very large, the FP tree built by the fpgrowth algorithm may be too large for the computer memory to load, and the distributed FPGROWTH,PFP algorithm will be used to compute it.

Reference book: "Data Mining concepts and technologies"

Mining Association rules of Data Mining Algorithm (ii) FPGROWTH algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.