Frequent Pattern Mining II (FP growth algorithm)

Source: Internet
Author: User

FP Tree construction

The FP growth algorithm takes advantage of ingenious data structures, greatly reducing the cost of the Aproir mining algorithm, and he does not need to constantly generate candidate project queues and constantly scan the entire database for comparison. To achieve this effect, it uses a concise data structure called Frequent-pattern tree (frequent pattern tree). Here's a detailed discussion of how to construct this tree, for example, the best way. Take a look at the following example:

This table describes a commodity trading list, ABCDEFG represents a commodity, (ordered) frequent items This column is the order of the goods in descending order, this sort is very important, we operate all the items must follow this sequence, the determination of this order is very simple, This order can be obtained once the database has been scanned. These non-frequent items are excluded from this column because the non-frequent projects have no effect on the entire excavation. In this example, we set the minimum support threshold (minimum supports threshold) to 3.

Our goal is to construct a tree for the entire commodity trading list. We first define the root node of this tree as NULL, and then we start to scan each record of the entire database to start constructing the FP tree.

First step: Scan the database for the first trade, that is, the TID is 100 transactions. Then you will get the first branch of this tree < (f:1), (C:1), (a:1), (m:1), (p:1) >. Note that this branch must be arranged according to the descending frequency.

Step two: Scan the second transaction (TID=200), we will have such a frequent item collection <f,c,a,b,m>. Looking closely at this queue, you will find that the top 3 items of this collection <f,c,a> the first three of the path <f,c,a,m,p> generated by the initial step are the same, meaning they can share a prefix. We then add 1 to the number of <f,c,a> three nodes based on the path generated in the first step, then add < (b:1), (m:1) > as a branch to the (a:2) node, and become its child node. See:

Step three: Then scan the third transaction (TID=300), you will see that the collection of this record is <f, b> Compared to the existing path, only f is the common prefix, then the F node plus 1, and then the F node to generate a new byte point (b:1). There will be:

Fourth step: Continue to see the fourth transaction, its collection is <c,b,p> Oh, it's not the same again. You will find that the first element of this collection is C, which is different from the first node F of the existing known path, so there is no need to go down, without any public prefixes. Attaches the collection directly as a sub-path of the root node. is obtained (Fig. 1):

Fifth step: The last transaction comes, and you see a collection of <f,c,a,m,p>. You're pleasantly surprised to find that the path is exactly the same as the tree's left-most path. So, the whole path is a public prefix, so all the points on this path are 1 better. The final figure is obtained (Figure 2).

Well, a FP tree has been basically built. Wait, it's almost there. The above tree is a little bit less than you can call a complete FP tree. To facilitate the traversal of the tree behind, we added a structure to the tree- head table, the head table holds all the frequent items, and in descending order of frequency, each item in the table contains a node-linked list, pointing to a node in the tree with the same name. Wordy for a long while, may still not clear, OK straight, a look you will understand:

The above is the complete process of the whole FP tree construction. Smart readers must be easy. Based on the above examples, the construction algorithm of FP tree is summarized. We will not repeat it here.

Frequent Pattern Mining II (FP growth Algorithm) (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.