Introduction to "SPMF open source data mining platform" MAXSP algorithm usage instructions

Source: Internet
Author: User

Some time ago, because the project used the algorithm of sequential mining, brother recommended me to use SPMF. Make a note here.

Let's start with a brief introduction to SPMF:

SPMF is an open source data mining platform with Java development.

It provides 51 data mining algorithm implementations for:

  • Sequential pattern Mining,
  • Association rules Mining,
  • Frequent itemset excavation,
  • Sequential rule Mining,
  • Clustering

HOME page:http://www.philippe-fournier-viger.com/spmf/

Let's start by learning about the concepts of sequence mining:

The so-called sequence pattern, my definition is: in a set of ordered data columns composed of data sets, often appear those sequence composition of the pattern. Unlike the mining of association rules we know, sequential pattern mining objects and results are ordered, that is, the entries for each sequence in the dataset are ordered in time or space, and the output is ordered. As a simple example, a classic application of association rules is to calculate the items that are purchased together in a supermarket shopping, and it regards each customer's transaction as a transaction, and calculates the regularity of the different item combinations in different transaction. And if we consider a user many times in the supermarket shopping situation, then these different time-point transactions constitute a purchase sequence, n users of the purchase sequence to form an N-scale series DataSet. Considering these time factors, we can get some rules that are more valuable than association rules, such as association mining often can dig out such as beer and diapers collocation law, and sequential pattern mining can dig out such as "Parenting Guide," and the baby car with a certain causal nature of the law. Therefore, sequential pattern mining can get deeper knowledge than relational mining. In practice, sequential pattern mining is widely used in a variety of sequence data sets, such as the microarray data of bioinformatics, which can be used to find out which gene combination patterns are frequently appearing in certain patients; a sequence of documents with words as item is used to study the occurrence patterns of word sequences in different documents; user click Stream data, Used to tap the user's frequent click mode, establish user model, improve website function and UI structure. In addition, there are many, as long as the sequence data set, can consider the use of sequential pattern mining to obtain the law. Figure 1 is a sequence database and its frequent sequence pattern with 0.75 as the minimum threshold (min_sup). This article introduces several key concepts in sequence mining.


Figure 1 Simple sequence database, basic concepts

Sequence (Sequence): As a SID, a sequence is a complete stream of information. Item: A collection of the smallest constituent units in a sequence, such as an item in this sample is {A, B, C}. Event: A timestamp flag is typically used to identify the pre-and post-relationship between events. Also called Itemset, is the collection of item, in the sample is expressed as an Eid. K Frequent sequence: If the number of items in a frequent sequence is k, it is called the K-frequent sequence, denoted by FK (f1,f2,f3 in Figure 1). The containment relationship of a sequence: for sequences x and Y, if there is an ordered mapping that causes each event in X to be contained in an event in Y, then the x is included in Y (the subsequence of x is Y), for example, the sequence B->ac is a subsequence of the sequence AB->E->ACD. Support: The degree of support for a sequence x refers to the frequency of the sequence that contains x in the entire sequence set. With these concepts in view, the problem of sequential pattern mining is defined as: given a sequence database and the minimum support degree min_sup, find all the sequential patterns with a support degree greater than min_sup. (citation: http://www.360doc.com/content/11/0924/09/7511080_150810319.shtml)
MAXSP Algorithm Introduction: (from SPMF, MAXSP algorithm ppt in the example)
 
 
That is, I have a database in the case, you can use the algorithm to dig out the support degree is greater than minsup pattern.
As you can see, the algorithm finds the maximum sequence pattern--maximal sequential patterns
 
  
 
 
First, the pattern is classified: 1:sequential patterns;2:closed patterns;3:maximal Patterns
 
 
How to determine whether it is the maximum maximal?

What do you mean? Here's an example:

Because there is {{a},{g}}, although the support for {{A}} is greater than Minsup, it is still not output because he is not a maximal.

To introduce the use of SPMF using the MAXSP algorithm, for example:

algorithm--algorithm description pdf+ppt download down

Open PPT can see the introduction of MAXSP: (PPT and PDF content is basically the same)

download--source code, JAR package

Document--example

Find MAXSP to see the algorithm description:

It says there are three ways to execute an algorithm:

1: Visual Interface: http://www.philippe-fournier-viger.com/spmf/how_to_run_graphical_interface.php

2: command line run;

3:ide, Direct copyMaintestmaxsp_savetofile.java

If you're using Eclipse, just change the input to the required format, and then make a simple change to it.

Here is attached demo for your reference: Https://github.com/XBWer/SPMF_MaxSP

Introduction to "SPMF open source data mining platform" MAXSP algorithm usage instructions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.