Introduction to sequence pattern Prefixspan algorithm

Source: Internet
Author: User

Sequence

A sequence (sequence) is a set of well-ordered itemsets that are not necessarily contiguous, but still satisfy the order. The elements of a sequence pattern can also be an item set, such as a group page sequence. Sequential pattern mining can get deeper knowledge than associative mining.

sequence Pattern

Sequence patternmining, for frequent sequences, the typical application is still limited to discrete sequences, happens-after relationship and not just the consecutive Subsequences.

It can be used for purchase behavior prediction, fraud screening, fault prediction, Web user access prediction, human behavior law, etc.

The algorithm is a variety of class Apriori algorithm, there are Aprioriall, Apriorisome, GSP (generalized sequential Patterns), SPADE (sequential PAttern Discovery Using equivalence classes), Prefixspan.



the difference from the time series

Unlike time series mining, time series (or dynamic series) refers to the sequence of values of the same statistic indicator in the chronological order in which they occur. The main purpose of time series analysis is to predict the future based on the historical data. Common Ma, AR, ARMA, Garch models.


Example

<a (ABC) (AC) d (CF) >-9 items (items), 5 itemsets (itemsets), 1 sequence (sequence)

<a (ABC) (AC) d (cf) > = <a (CBA) (AC) d (CF) >

<a (ABC) (AC) d (CF) >≠<a (AC) (ABC) d (CF) >

Min Support (minimum supported) threshold-frequent subsequence frequency is not less than the minimum support level (Find all frequent subsequences,i.e. The subsequences whose occurrence Frequency in the set of sequences is noless than Min_support)

Supersequence: <a (ABC) (AC) d (CF) >

SUB-SEQUENCE:<AA (AC) d (c) >

sub-sequence:< (AC) (AC) d (CF) >

Sub-sequence:<ac>

<a (ABC) (AC) d (CF) >α1=<a> support (α1) = 4

< (AD) c (BC) (AE) >α2=<ac> support (α2) = 4

< (EF) (AB) (DF) cb>α3=< (AB) c> support (α3) = 2

<eg (AF) cbc>


=================

Prefixspan

prefix prefix

SEQ <a (ABC) a> is a prefix of Seq<a (ABC) (AC) d (cf);, but seq <a (ABC) c> are not. <a>, <aa>, <a (AB) >, <a (ABC) > are prefixes of sequence <a (ABC) (AC) d (CF) >, while <ab>, <a (BC) > are not.


suffix postfix

Seqβ<a (ABC) a> is a prefix and seqγ< (_c) d (CF) > is a postfix of Seqα<a (ABC) (AC) d (cf);. Denoteα=β⋅γ or γ=α/β

For sequence <a (ABC) (AC) d (CF);

< (ABC) (AC) d (CF) > is the suffix of the prefix <a>;

< (_BC) (AC) d (CF) > is the suffix of the prefix <aa>;

< (_c) (AC) d (CF) > is the suffix of the prefix <a (AB) >;

The "_" subscript represents the prefix.

Projection Projection

Projection is the projection database, which is a collection of all suffix sequences in the sequence database s that are relative to the alpha prefix.

algorithm

Sub-Program: Prefixspan (Α,l,)

Parameters:

Alpha refers to the sequence pattern of prefixes;

L refers to the length of α;

Refers to the projection database of Alpha.

Algorithm:

1, scan, find frequent item set B:
A) B can be the last set of items of α (e.g. AB + c=> ABC), OR:
b) B can be appended to α to form a new sequence pattern (e.g. AB +_c + A (BC));

2, for each frequent item B, append to α to form a new sequence pattern α ' (such as ABC or a (BC));

3, for each α ', constructs α ' projection database, and calls Prefixspan (α ', l+1,) its process for depth-first search.


Advantages:

1) do not produce any of the Hou anthology, reduce space;

2) The projection database scale is decreasing (because the projection only occurs in the suffix section associated with the prefix);

3) The method of divide and conquer is used to improve the efficiency of the algorithm, and it is more stable in memory use than spade and GSP algorithm.

Disadvantages:

1) The main cost of the algorithm is the construction of the projection database, if the sequence of multiple and each sequence to establish a projection database, then the cost is relatively large (through a. Pseudo-projection technology (pseudo-projection) to reduce the number and size of projection database; b. Bi-level projection); 2) difficult to achieve.


Example



Part of the demonstration process



Final result



Code printing results (available for debug control)

Min_support:2


Input sequence:a (ABC) (AC) d (CF)
Input Sequence: (AD) c (BC) (AE)
Input Sequence: (EF) (AB) (DF) C B
Input sequence:e g (AF) c b C
frequence:a=4 b=4 c=4 d=3 e=3 f=3 g=1
support:a=4 b=4 c=4 d=3 e=3 f=3
Fullprefix~~~: A
Lastprefix:a, PostFix: (ABC) (AC) d (CF)
Lastprefix:a, PostFix: (_d) C (BC) (AE)
Lastprefix:a, PostFix: (_b) (DF) C B
Lastprefix:a, PostFix: (_f) c b C


Input Sequence: (ABC) (AC) d (CF)
Input Sequence: (_d) C (BC) (AE)
Input Sequence: (_b) (DF) C B
Input Sequence: (_f) c b C
frequence:a=2 b=4 _b=2 c=4 _c=1 d=2 _d=1 e=1 f=2 _e=1 _f=1
support:a=2 b=4 _b=2 c=4 d=2 f=2
fullprefix~~~: AA
Lastprefix:a, PostFix: (_BC) (AC) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_BC) (AC) d (CF)
Input Sequence: (_e)
Frequence:a=1 _b=1 c=1 _c=1 d=1 f=1 _e=1
Support
fullprefix~~~: AB
Lastprefix:b, PostFix: (_c) (AC) d (CF)
Lastprefix:b, PostFix: (_c) (AE)
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence: (_c) (AC) d (CF)
Input Sequence: (_c) (AE)
Input Sequence:
Input Sequence:c
frequence:a=2 c=2 _c=2 d=1 e=1 f=1
support:a=2 c=2 _c=2
fullprefix~~~: ABA
Lastprefix:a, PostFix: (_c) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_c) d (CF)
Input Sequence: (_e)
Frequence:c=1 _c=1 d=1 f=1 _e=1
Support
fullprefix~~~: ABC
LASTPREFIX:C, postfix:d (CF)
LASTPREFIX:C, PostFix:


Input sequence:d (CF)
Input Sequence:
Frequence:c=1 d=1 f=1 _f=1
Support
fullprefix~~~: A (BC)
Lastprefix: _c, PostFix: (AC) d (CF)
Lastprefix: _c, PostFix: (AE)


Input Sequence: (AC) d (CF)
Input Sequence: (AE)
frequence:a=2 c=1 d=1 e=1 f=1
support:a=2
fullprefix~~~: A (BC) a
Lastprefix:a, PostFix: (_c) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_c) d (CF)
Input Sequence: (_e)
Frequence:c=1 _c=1 d=1 f=1 _e=1
Support
fullprefix~~~: (AB)
Lastprefix: _b, PostFix: (_c) (AC) d (CF)
Lastprefix: _b, PostFix: (DF) c B


Input Sequence: (_c) (AC) d (CF)
Input Sequence: (DF) c B
Frequence:a=1 b=1 c=2 _c=1 d=2 f=2
support:c=2 d=2 f=2
fullprefix~~~: (AB) c
LASTPREFIX:C, postfix:d (CF)
LASTPREFIX:C, Postfix:b


Input sequence:d (CF)
Input sequence:b
Frequence:b=1 c=1 d=1 f=1 _f=1
Support
fullprefix~~~: (AB) d
Lastprefix:d, PostFix: (CF)
Lastprefix:d, PostFix: (_f) c b


Input Sequence: (CF)
Input Sequence: (_f) c b
Frequence:b=1 c=2 f=1 _f=1
support:c=2
fullprefix~~~: (AB) DC
Lastprefix:c, PostFix: (_f)
LASTPREFIX:C, Postfix:b


Input Sequence: (_f)
Input sequence:b
Frequence:b=1 _f=1
Support
fullprefix~~~: (AB) F
Lastprefix:f, PostFix:
Lastprefix:f, Postfix:c b


Input Sequence:
Input Sequence:c b
Frequence:b=1 c=1
Support
fullprefix~~~: AC
Lastprefix:c, PostFix: (AC) d (CF)
Lastprefix:c, PostFix: (BC) (AE)
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, Postfix:b C


Input Sequence: (AC) d (CF)
Input Sequence: (BC) (AE)
Input sequence:b
Input sequence:b C
frequence:a=2 b=3 c=3 d=1 e=1 f=1 _f=1
support:a=2 b=3 c=3
fullprefix~~~: ACA
Lastprefix:a, PostFix: (_c) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_c) d (CF)
Input Sequence: (_e)
Frequence:c=1 _c=1 d=1 f=1 _e=1
Support
fullprefix~~~: ACB
Lastprefix:b, PostFix: (_c) (AE)
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence: (_c) (AE)
Input Sequence:
Input Sequence:c
Frequence:a=1 c=1 _c=1 e=1
Support
fullprefix~~~: ACC
LASTPREFIX:C, postfix:d (CF)
Lastprefix:c, PostFix: (AE)
LASTPREFIX:C, PostFix:


Input sequence:d (CF)
Input Sequence: (AE)
Input Sequence:
Frequence:a=1 c=1 d=1 e=1 f=1 _f=1
Support
fullprefix~~~: Ad
Lastprefix:d, PostFix: (CF)
Lastprefix:d, PostFix: (_f) c b


Input Sequence: (CF)
Input Sequence: (_f) c b
Frequence:b=1 c=2 f=1 _f=1
support:c=2
Fullprefix~~~: ADC
Lastprefix:c, PostFix: (_f)
LASTPREFIX:C, Postfix:b


Input Sequence: (_f)
Input sequence:b
Frequence:b=1 _f=1
Support
fullprefix~~~: AF
Lastprefix:f, PostFix:
Lastprefix:f, Postfix:c b


Input Sequence:
Input Sequence:c b
Frequence:b=1 c=1
Support
Fullprefix~~~: b
Lastprefix:b, PostFix: (_c) (AC) d (CF)
Lastprefix:b, PostFix: (_c) (AE)
Lastprefix:b, PostFix: (DF) c B
Lastprefix:b, Postfix:c


Input Sequence: (_c) (AC) d (CF)
Input Sequence: (_c) (AE)
Input Sequence: (DF) c B
Input Sequence:c
frequence:a=2 b=1 c=3 _c=2 d=2 e=1 f=2
support:a=2 c=3 _c=2 d=2 f=2
fullprefix~~~: BA
Lastprefix:a, PostFix: (_c) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_c) d (CF)
Input Sequence: (_e)
Frequence:c=1 _c=1 d=1 f=1 _e=1
Support
fullprefix~~~: BC
LASTPREFIX:C, postfix:d (CF)
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, PostFix:


Input sequence:d (CF)
Input sequence:b
Input Sequence:
Frequence:b=1 c=1 d=1 f=1 _f=1
Support
fullprefix~~~: (BC)
Lastprefix: _c, PostFix: (AC) d (CF)
Lastprefix: _c, PostFix: (AE)


Input Sequence: (AC) d (CF)
Input Sequence: (AE)
frequence:a=2 c=1 d=1 e=1 f=1
support:a=2
fullprefix~~~: (BC) A
Lastprefix:a, PostFix: (_c) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_c) d (CF)
Input Sequence: (_e)
Frequence:c=1 _c=1 d=1 f=1 _e=1
Support
fullprefix~~~: BD
Lastprefix:d, PostFix: (CF)
Lastprefix:d, PostFix: (_f) c b


Input Sequence: (CF)
Input Sequence: (_f) c b
Frequence:b=1 c=2 f=1 _f=1
support:c=2
fullprefix~~~: BDC
Lastprefix:c, PostFix: (_f)
LASTPREFIX:C, Postfix:b


Input Sequence: (_f)
Input sequence:b
Frequence:b=1 _f=1
Support
fullprefix~~~: BF
Lastprefix:f, PostFix:
Lastprefix:f, Postfix:c b


Input Sequence:
Input Sequence:c b
Frequence:b=1 c=1
Support
Fullprefix~~~: C
Lastprefix:c, PostFix: (AC) d (CF)
Lastprefix:c, PostFix: (BC) (AE)
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, Postfix:b C


Input Sequence: (AC) d (CF)
Input Sequence: (BC) (AE)
Input sequence:b
Input sequence:b C
frequence:a=2 b=3 c=3 d=1 e=1 f=1 _f=1
support:a=2 b=3 c=3
Fullprefix~~~: CA
Lastprefix:a, PostFix: (_c) d (CF)
Lastprefix:a, PostFix: (_e)


Input Sequence: (_c) d (CF)
Input Sequence: (_e)
Frequence:c=1 _c=1 d=1 f=1 _e=1
Support
Fullprefix~~~: CB
Lastprefix:b, PostFix: (_c) (AE)
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence: (_c) (AE)
Input Sequence:
Input Sequence:c
Frequence:a=1 c=1 _c=1 e=1
Support
fullprefix~~~: CC
LASTPREFIX:C, postfix:d (CF)
Lastprefix:c, PostFix: (AE)
LASTPREFIX:C, PostFix:


Input sequence:d (CF)
Input Sequence: (AE)
Input Sequence:
Frequence:a=1 c=1 d=1 e=1 f=1 _f=1
Support
fullprefix~~~: D
Lastprefix:d, PostFix: (CF)
Lastprefix:d, Postfix:c (BC) (AE)
Lastprefix:d, PostFix: (_f) c b


Input Sequence: (CF)
Input sequence:c (BC) (AE)
Input Sequence: (_f) c b
Frequence:a=1 b=2 c=3 e=1 f=1 _f=1
support:b=2 c=3
fullprefix~~~: DB
Lastprefix:b, PostFix: (_c) (AE)
Lastprefix:b, PostFix:


Input Sequence: (_c) (AE)
Input Sequence:
Frequence:a=1 _c=1 e=1
Support
Fullprefix~~~: DC
Lastprefix:c, PostFix: (_f)
Lastprefix:c, PostFix: (BC) (AE)
LASTPREFIX:C, Postfix:b


Input Sequence: (_f)
Input Sequence: (BC) (AE)
Input sequence:b
Frequence:a=1 b=2 c=1 e=1 _f=1
support:b=2
fullprefix~~~: DCB
Lastprefix:b, PostFix: (_c) (AE)
Lastprefix:b, PostFix:


Input Sequence: (_c) (AE)
Input Sequence:
Frequence:a=1 _c=1 e=1
Support
Fullprefix~~~: E
Lastprefix:e, PostFix:
Lastprefix:e, PostFix: (_f) (AB) (DF) C B
Lastprefix:e, Postfix:g (AF) c b C


Input Sequence:
Input Sequence: (_f) (AB) (DF) C B
Input sequence:g (AF) c b C
frequence:a=2 b=2 c=2 d=1 f=2 _f=1 g=1
support:a=2 b=2 c=2 f=2
fullprefix~~~: ea
Lastprefix:a, PostFix: (_b) (DF) C B
Lastprefix:a, PostFix: (_f) c b C


Input Sequence: (_b) (DF) C B
Input Sequence: (_f) c b C
frequence:b=2 _b=1 c=2 d=1 f=1 _f=1
support:b=2 c=2
fullprefix~~~: EAB
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence:
Input Sequence:c
Frequence:c=1
Support
fullprefix~~~: EAC
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, Postfix:b C


Input sequence:b
Input sequence:b C
frequence:b=2 c=1
support:b=2
fullprefix~~~: EACB
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence:
Input Sequence:c
Frequence:c=1
Support
Fullprefix~~~: EB
Lastprefix:b, PostFix: (DF) c B
Lastprefix:b, Postfix:c


Input Sequence: (DF) c B
Input Sequence:c
Frequence:b=1 c=2 d=1 f=1
support:c=2
fullprefix~~~: EBC
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, PostFix:


Input sequence:b
Input Sequence:
Frequence:b=1
Support
fullprefix~~~: EC
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, Postfix:b C


Input sequence:b
Input sequence:b C
frequence:b=2 c=1
support:b=2
fullprefix~~~: ECB
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence:
Input Sequence:c
Frequence:c=1
Support
fullprefix~~~: EF
Lastprefix:f, Postfix:c b
Lastprefix:f, Postfix:c b C


Input Sequence:c b
Input Sequence:c b C
frequence:b=2 c=2
support:b=2 c=2
fullprefix~~~: EFB
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence:
Input Sequence:c
Frequence:c=1
Support
fullprefix~~~: EFC
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, Postfix:b C


Input sequence:b
Input sequence:b C
frequence:b=2 c=1
support:b=2
fullprefix~~~: EFCB
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence:
Input Sequence:c
Frequence:c=1
Support
Fullprefix~~~: F
Lastprefix:f, PostFix:
Lastprefix:f, PostFix: (AB) (DF) C B
Lastprefix:f, Postfix:c b C


Input Sequence:
Input Sequence: (AB) (DF) C B
Input Sequence:c b C
Frequence:a=1 b=2 c=2 d=1 f=1
support:b=2 c=2
fullprefix~~~: FB
Lastprefix:b, PostFix: (DF) c B
Lastprefix:b, Postfix:c


Input Sequence: (DF) c B
Input Sequence:c
Frequence:b=1 c=2 d=1 f=1
support:c=2
fullprefix~~~: FBC
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, PostFix:


Input sequence:b
Input Sequence:
Frequence:b=1
Support
Fullprefix~~~: FC
LASTPREFIX:C, Postfix:b
LASTPREFIX:C, Postfix:b C


Input sequence:b
Input sequence:b C
frequence:b=2 c=1
support:b=2
fullprefix~~~: FCB
Lastprefix:b, PostFix:
Lastprefix:b, Postfix:c


Input Sequence:
Input Sequence:c
Frequence:c=1
Support
FULLPREFIXDB:
A, AA, AB, ABA, ABC, A (BC), a (BC) A, (AB), (AB) c, (AB) d, (AB) DC, (AB) F, AC, ACA, ACB, ACC, AD, ADC, AF, B, BA, BC, (BC), (b c) A, BD, BDC, BF, C, CA, CB, CC, D, DB, DC, DCB, E, EA, EAB, EAC, EACB, EB, EBC, EC, ECB, EF, EFB, EFC, EFCB, F, FB, FBC, FC, FCB,


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.