Several related directions of FP parallel algorithm

Source: Internet
Author: User

1 fp-tree parallel algorithm in the cluster system (many for one task or is it a cloud computing single for many multiple tasks?) )


The computer cluster system uses the network to connect a group of workstations or PCs with high performance to a certain structure, thus forming an efficient parallel computing process.
System. Communication is implemented between nodes using messaging, and cluster systems are often used to improve the computational speed and reliability of a single computer.
The process of mining each conditional pattern library by the FP-GROWTH algorithm is independent of each other, and there is no data and information exchange between them. This mutually independent feature can
The fp-growth algorithm transforms into a parallel algorithm, if the mining of each conditional pattern library is regarded as a sub-task, then the total frequent pattern mining task can be divided into the number
Several subtasks that are equal to the number of frequent items.

These subtasks are then assigned to each node in the computer cluster, each node of the computer cluster completes the
From the sub-task, the results are transferred to the central node, the central node to form a unified calculation results.

2 Parallel Computing method for dividing fp-tree into small fp-tree


For a given association rule Mining task, how do you break it down into multiple, independent sub-tasks? This allows for parallel distributed processing. One way to analyze this is to
Divide the fp-tree into small fp-tree and then perform parallel computations.

It is necessary to prove the equivalence of the combination of all local trees and the global tree.

The method is: according to fp-tree corresponding headertable each item prefix path total length, the Header Table group, constructs the node number roughly equal small
FP Tree. The method of constructing the small FP tree is to extract the node location of the Header table nodes, find the conditional pattern base of the corresponding nodes, and then use the same Group Header table package
A new FP tree and Header table are generated for all conditional pattern bases, and when you construct a new FP tree and a new Header table for a part of the Header table, you do not have to use this part
The Header table contains items other than items that are placed in the new Header table. This divides the large fp-tree into multiple small fp-tree to facilitate multi-process or parallel processing of multiple machines.


3 Parallel fp-growth algorithm for database transactions (based on Hadoop platform, can be automatically distributed, each map default 64MB. To be continued in detail. )


In the parallel fp-growth algorithm, one algorithm is to divide the records in the database according to the quantity, and then perform parallel computations on multiple processes.
The basic steps of the algorithm are as follows:
1) Divide the transactions in the database and assign the transactions with the number of equal numbers to the corresponding processing process;
2) Each process calculates the count of items separately, then summarizes the frequent 1-itemsets;
3) Each processing process gets a frequent pattern tree according to the assigned transaction, and the global frequent 1-item set list each item consists of a node chain and each local Fp-tree
junction of the nodes;
4) on the global 1-Frequent itemsets list, multiple local fp-tree, and their interconnected inter-frequent pattern tree, which can be used in parallel frequent
Pattern mining.

Several related directions of FP parallel algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.