Data Mining algorithm: Correlation analysis Two (Fp-tree algorithm)

Source: Internet
Author: User

Three. Fp-tree algorithm

Here is an algorithmic fp-tree that uses a completely different approach to discovering frequent itemsets with Apriori. The fp-tree algorithm does not produce candidate sets like Apriori in the process, but uses a more compact data structure to organize the tree, and then extracts the frequent itemsets directly from the structure. The process of the fp-tree algorithm is:

The support is calculated first for each item in the transaction, and the non-frequent items are discarded, and the support for each item is sorted in reverse order. The items in each transaction are also sorted in reverse order.

This is inserted into a tree with a null root node, based on the new order of transaction items in each transaction. It also records the degree of support for each transaction item. When this process is complete, we get a fp-tree tree structure.

For the completed Fp-tree, the previous path is converted to conditional fp-tree from the top of the tree structure to the bottom of each item.

Find all frequent itemsets based on the fp-tree of each condition.

This description of the Fp-tree algorithm process is abstract, and we use the following example to find out how the Fp-tree algorithm finds frequent itemsets.

(Source: Data Mining: Concept and Technology Jiawei, Han)

First, the support is calculated for all itemsets in the practice and then sorted in reverse order, as shown in the green table in. The items in each transaction are then rearranged in this reverse order. For example, for T100 this transaction, it turns out to be unordered ⅰ1,ⅰ2,ⅰ5, but because Ⅰ2 's support is sorted in reverse order before ⅰ1, the reorder is ⅰ2,ⅰ1,ⅰ5. The item set of the reordered transaction is shown in the third column in the following table.

Rescan the transaction library, inserting the tree in null as the root node in the order of the reordered itemsets. For the transaction T100, create the ⅰ2,ⅰ1,ⅰ5 three nodes sequentially, and then you can form a null→ⅰ2→ⅰ1→ⅰ5 path, and the frequency count of all nodes on that path is recorded as 1. There is already a ⅰ2 in the transaction t200,fp-tree, so a null→ⅰ2→ⅰ4 path is formed, and a ⅰ4 node is created. In this case, the frequency count on the ⅰ2 node is increased by 1, recorded as 2, and the frequency count of the node ⅰ4 is recorded as 1. Following the same procedure, the tree structure can be obtained after scanning all the transactions in the library.

For the fp-tree that are built, the condition Fp-tree of each item is built sequentially from the bottom of the tree. First we found the node ⅰ5 in, found that the path to reach Ⅰ5 has two {ⅰ2,ⅰ1,ⅰ5:1} and {ⅰ2,ⅰ1,ⅰ3,ⅰ5:1}.

The conditions for constructing the ⅰ5 based on the two-day path are as shown, where ⅰ3 is to be shed because the ⅰ3 count here is 1 cannot satisfy the condition of the frequent itemsets. Then use the ⅰ5 prefix {ⅰ2,ⅰ1:2} to enumerate all combinations with the suffix ⅰ5, resulting in {ⅰ2,ⅰ5},{ⅰ2,ⅰ1} and {ⅰ2,ⅰ1,ⅰ5} three frequent itemsets.

Performing the above steps for all items, we can get the frequent itemsets generated by all the items.

Https://www.cnblogs.com/zhengxingpeng/p/6679280.html

Evaluation of advantages and disadvantages:

Compared with the Apriori algorithm, the FP-TREE algorithm has significantly improved the complexity of time and space. But for the massive data sets, the space-time complexity is still very high, at this time need to use the database partition technology.

Data Mining algorithm: Correlation analysis Two (Fp-tree algorithm)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.