Construction of the height array of the suffix Array

Source: Internet
Author: User

Http://churuimin425.blog.163.com/blog/static/34129877201141005542104/

A simple summary of converting an array with a suffix:

 A suffix array is an array that sorts all strings with suffixes.S, SuffixSuffix (I)IndicatesS [I.. Len (s)]. Use two arrays to record the sorting results of all suffixes:

· rank [I] record the sequence number after suffix (I) , that is, suffix [I] is the rank [I] small suffix

·Sa [I]Record NumberIThe first letter of the suffix, that isSuffix [SA [I]In all suffixesISmall suffix

Then there is how to quickly find the order of all suffixes. The key is how to reduce the complexity of comparing two suffixes.
The method is the multiplication method, which defines a stringK-Prefix is before the stringKCharacter stringK-Suffix DefinitionSuffix (K, I),Sa [K, I]AndRank [K, I]Similar to the previous

·IfRank [K, I]=Rank [K, J]AndRank [K, I + k]=Rank [K, J + k], ThenSuffix [2 K, I]=Suffix [2 K, J]

·IfRank [K, I]=Rank [K, J]AndRank [K, I + k] <rank [K, J + k], ThenSuffix [2 K, I] <suffix [2 K, J]

·IfRank [K, I] <rank [K, J], ThenSuffix [2 K, I] <suffix [2 K, J]

In this way, we can compareSuffix (2 ^ K, I)ToSuffix (2 ^ K, I)Time line sorting, lastWhen2 ^ K> N,Suffix (2 ^ K, I)The size is the size between all suffixes.

So I figured out the sorting of all suffixes. what is the use? It is mainly used to find the longest public prefix between them (Longest Common prefix,LCP)

LingLCP (I, j)IsISmall suffixes and numbersJA small suffix (that isSuffix (SA [I])AndSuffix (SA [J])), There are two properties:

1.For anyI <= k <= J, YesLCP (I, j) = min (LCP (I, K), LCP (K, j ))

2.LCP (I, j) = min (I <k <= J) (LCP (K-1, k ))

the first nature is obvious, and its significance is that it can be used to prove the second nature. The second feature provides the function of converting LCP to rmq method:
height [I] = LCP (I-1, i) , that is, height [I] indicates the suffix and the LCP , LCP (I, j) is equal to height [I + 1] ~ rmq , apply rmq algorithm , complexity: preprocessing O (nlogn) and querying O (1)

ThenHeightTo use another array:H [I] = height [rank [I], That isH [I]IndicatesSuffix (I)OfHeightValue (at the same timeHeight [I]IndicatesSuffix (SA [I])OfHeightValue ),Height [I] = H [SA [I]
ThenH [I]Personality:

·H [I]> = H [I-1]-1

With this nature, we are computingH [I]When the suffix is comparedYou only needH [I-1]Bit-start comparisonAnd the overall complexity isO (N)That isHArray inO (N). ObtainedHArray, according to the relationshipHeight [I] = H [SA [I]You canO (N)TimeHeightArray, you canO (N)TimeHeightArray, thus the entireLCPThe problem is solved.Pai_^

Then the application of the suffix array uses itsLCPReduce complexity when string comparison is required. Meanwhile, because of the Order of the suffix array, binary can be easily used.

So let's summarize the key points:

·The multiplication algorithm is used inO (nlogn)Sort the suffix array within the specified time.

·ExploitationHThe attribute of the array isO (N)Within the specified timeStoresLCPNumber GroupHeight

·ExploitationLCPWill be ordinaryLCPProblem convertedHeightArrayRmqProblem

 

This question is to find the longest announcement substring of the two strings (note that the substring is not a subsequence) and constructSa, RAAndHeightThe answer is the biggestHeightBut the largestHeightIt may be in the same string,

So the largestHeightAt the same timeSa [I-1]AndSa [I]Not in the same string.Code

 

Constructing suffix array with doubling algorithm, O (n log n ).
//////////////////////////////////////// /////////////////////////
# Include <algorithm> // sort
# Include <cstring> // memset

Using namespace STD;

Const int max_sfx= 210000;

Struct SFX {
Int I; int key [2];
Bool operator <(const SFX & S) const
{Return key [0] <S. Key [0]
| Key [0] = S. Key [0] & Key [1] <S. Key [1];}
};

Int g_buf [max_sfx + 1];
SFX g_tempsfx [2] [max_sfx], * g_sa = g_tempsfx [0];

Void csort (SFX * In, int N, int key, SFX * Out ){
Int * CNT = g_buf; memset (CNT, 0, sizeof (INT) * (n + 1 ));
For (INT I = 0; I <n; I ++) {CNT [in [I]. Key [Key] ++ ;}
For (INT I = 1; I <= N; I ++) {CNT [I] + = CNT [I-1];}
For (INT I = n-1; I> = 0; I --)
{Out [-- CNT [in [I]. Key [Key] = in [I];}
}

// Build a suffix array from string 'text' whose length is 'len '.
// Write the result into global array 'G _ Sa '.
Void buildsa (char * Text, int Len ){
SFX * temp = g_tempsfx [1];
Int * Rank = g_buf;
For (INT I = 0; I <Len; I ++)
{G_sa [I]. I = g_sa [I]. Key [1] = I; g_sa [I]. Key [0] = text [I];}
Sort (g_sa, g_sa + Len );
For (INT I = 0; I <Len; I ++) {g_sa [I]. Key [1] = 0 ;}
Int WID = 1;
While (WID <Len ){
Rank [g_sa [0]. I] = 1;
For (INT I = 1; I <Len; I ++)
{Rank [g_sa [I]. I] = rank [g_sa [I-1]. I];
If (g_sa [I-1] <g_sa [I]) {rank [g_sa [I]. I] ++ ;}}
For (INT I = 0; I <Len; I ++)
{G_sa [I]. I = I; g_sa [I]. Key [0] = rank [I];
G_sa [I]. Key [1] = I + WID <Len? Rank [I + WID]: 0 ;}
Csort (g_sa, Len, 1, temp); csort (temp, Len, 0, g_sa );
WID * = 2;
}
}

Int getlcp (char * a, char * B)
{Int L = 0; while (* A & * B & * A = * B) {L ++; A ++; B ++ ;} return L ;}

Void getlcp (char * Text, SFX * SFX, int Len, int * LCP ){
Int * Rank = g_buf;
For (INT I = 0, r = 0; I <Len; I ++, r ++) {rank [SFX [I]. I] = r ;}
LCP [0] = 0;
If (rank [0])
{LCP [rank [0] = getlcp (text, text + SFX [rank [0]-1]. I );}
For (INT I = 1; I <Len; I ++ ){
If (! Rank [I]) {continue ;}
If (LCP [rank [I-1] <= 1)
{LCP [rank [I] = getlcp (Text + I, text + SFX [rank [I]-1]. I );}
Else
{Int L = LCP [rank [I-1]-1;
LCP [rank [I] = L + getlcp (Text + I + L, text + SFX [rank [I]-1]. I + l );}
}
}

// Test suite and usage example
# Include <iostream>
Using namespace STD;
Int main (){
Char STR [] = "aabbaa {post. Content} abababab ";
Int from [] = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1 };
Int LCP [13];
Buildsa (STR, 13); getlcp (STR, g_sa, 13, LCP );
For (INT I = 1; I <13; I ++) // The first suffix is useless (empty ).
{Cout <from [g_sa [I]. i] <''<STR + g_sa [I]. I <''<LCP [I] <Endl ;}
Return 0; // output: 0 A 0
// 0 aa 1
// 0 aabbaa 2
// 1 AB 1
// 1 Abab 2
// 1 ababab 4
// 0 abbaa 2
// 1 B 0
// 0 Baa 1
// 1 Bab 2
// 1 BABAB 3
// 0 bbaa 1
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.