PFP (Parallel fpgrowth)

Source: Internet
Author: User

Distributed Fp-tree

1. First, the shopping basket data is sorted, counted, assuming min_sup=3. Remove items with a support level of less than 3.

2. According to Fp-tree's drawing, the Fcamp,fcabm,fb,cbp,fcamp of the second column, the establishment of Fp-tree as follows:

3. The third column is a right-to-left traversal of the second column to obtain a path to a point, for example, the path to P is Fcam, to M is the FCA, to a is FC, to C is f, the process occurs on the map side, the basket data is stored on each node, resulting in the third column as shown in the <k,v>

4. Through the shuffle process, sent to the reducer, it is easy to find the frequent mode on the reduce side

In order to verify the above results, mining frequent patterns with Fp-tree:

P: The P count on the first path is 2, less than Min_sup, then all items with a count of 2 are removed (f,a,m), c,p appear on the rightmost path 1 times, plus p:2,c:2 on the first path, and finally P:3,c:3

M: The first path on the f2-c2-a2-m2, the second path f1-c1-b1-m1, filtered out B, a total of f3-c3-a3-m3, the final mode is: F:3,c:3,a:3,m:3

B: I can't dig anything out.

Similarly:

A:f:3,c:3,a:3

C:f:3,c:3

The fate of the certificate.

PFP Algorithm Bottleneck:

On the reduce side, mapshuffle all data on the reduce side, which also makes it easy to burst the memory of the reduce node

Http://infolab.stanford.edu/~echang/recsys08-69.pdf gives a method

1. The hypothesis is divided into two groups G1 and g2,g1 contain commodity c,a,p; G2 contains merchandise f,b,m,

2. Processing of each basket data

The first basket data: f,c,a,m,p, divided into two groups of G1,G2, the idea is based on the product map out a lot of <k,v>, and here is no longer based on the product map, but according to the group map, the first shopping basket, the G1 group, the rightmost is P, Then write down the shopping basket from right to left Fcamp, here key is G1,value is fcamp; When G2 is key, value is Fcam, i.e. <G1,fcamp><G2,fcam>;

Second Basket data: f,c,a,b,m, divided into two groups of G1,g2,<g1,fca><g2,fcabm>

Third Basket data: f,b, divided into two groups of G1,g2,<g1,null><g2,fb>

Fourth basket data: C,b,p divided into two groups of G1,g2,<g1,cbp><g2,cb>

Fifth Basket data: C,b,p divided into two groups of G1,g2,<g1,fcamp><g2,fcam>

Process such as:

3. The above <k,v> will be mapped to g1,g2 two machines, respectively, to reconstruct the Fp-tree

PFP (Parallel fpgrowth)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.