Freespan and Prefixspan Algorithm learning

Source: Internet
Author: User

Today encountered two more well-known in the field of frequent pattern mining is often used in the algorithm: Freespan and Prefixspan algorithm, said, I can only say that the two algorithms are understood, but can not say very clearly, just today met on the straight-tempered, check the information, Tidy up as today's learning notes about these two algorithms, the end of the article will be given the query using the data link:

Freespan, a sequential pattern mining with frequent pattern projection, is the basic idea of projecting a sequence database into a smaller set of projection databases using frequent items recursively, and generating sub-sequence fragments in each projection database. This process splits the data and the frequent pattern set to be tested, And each test is limited to a smaller projection database that conforms to it. The basic idea is that the sequence database is projected into a smaller projection database set recursively using frequent items, and the sub-sequence fragments are generated in each projection database. This process splits the data and the frequent pattern set to be tested, and limit each test to a smaller projection database that matches it.

The simple process is as follows:

(1) First given the sequence database s and the minimum support threshold value of ζ. (2) Scan the sequence database s, find the frequent itemsets in S, and generate the F_list list in descending order. Perform the following steps: Divide the database into disjoint subsets based on the generated f_list list. Contains only the first item. Contains the second item, but does not contain future items. Contains the nth item, but does not contain the item after n. Contains only the last item. B. Scan the database for the first time to find out the frequency of each item and its items with the previous item in the sequence database, and delete items that are less than the minimum support level. D. Mining a longer frequency sequence for the generated items that are greater than the minimum support level. Until the final projection database is the largest frequent subset. Freespan algorithm Analysis: It unifies the frequent sequence and frequent pattern mining, restricts the mining work to the projection database, and restricts the growth of the sequence shards. It can effectively discover the complete sequence pattern and greatly reduce the overhead required to produce candidate sequences, much faster than the Apriori-based GSP algorithm. Deficiencies, it may produce many projection databases, and if a pattern appears in every sequence in the database, the projection database for that pattern will not be scaled down; In addition, a sequence of length k may grow anywhere, so a candidate sequence of length k+1 must examine each possible combination of The overhead required is relatively large.
Prefixspan algorithm is also a sequence pattern analysis algorithm, but the former is different from the Prefixspan algorithm does not produce any Hou anthology, in this point can be said to be much better than the GSP. The Prefixspan algorithm can dig out all sequence patterns satisfying the threshold, which can be said to be a very classical algorithm. The format of the sequence is similar to the one mentioned above in <a, B, (DE) >. Reference Blog: http://blog.csdn.net/androidlushangderen/article/details/43766253
http://blog.csdn.net/u011860731/article/details/47685557
http://blog.csdn.net/shuke1991/article/details/52526913
http://zzkshaanxi.blog.163.com/blog/static/19761892009101921142779/


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.