# "Suffix array" "It's hard to understand."

Source: Internet
Author: User

Basically a suffix array the template on the web is the "suffix array-a powerful tool for handling strings," the note, O (NLOGN) complexity is really very powerful, But for the first contact (such as the nest), it is difficult to understand (for example, the nest has been good for two days. ), see so many materials feel "challenge program design" suffix array interpretation will be relatively easy to understand, but its complexity is O (NLOG2N), the main difference is the number of sub-string ordering--o (n), while the latter used a fast row--o (NLOGN), This leads to the eventual complexity of the latter more than the former One O (logn).

First of all, the "challenge" in the process, and later the O (NLOGN) sorted down;

First with a refreshing version of SA (Suffix_array) template, the main idea of course is still multiplication method;

`1 /*0 (Nlog (n) ^2)*/2#include <iostream>3#include <cstring>4#include <cstddef>5#include <cstdio>6#include <string>7#include <algorithm>8 using namespacestd;9 Const intMAXN =10001;Ten intn,k; One intrank[maxn+1],tmp[maxn+1]; A  - BOOLComp_sa (intIintj) - { the     if(Rank[i]! =Rank[j]) -         returnRank[i] <Rank[j]; -     intRI = I+k <= n? RANK[I+K]:-1; -     intRJ = J+k <= n? RANK[J+K]:-1; +     returnRI <RJ; - } +  A voidCalc_sa (string&s,int*SA)//computes the suffix array of the string s at { -n =s.size (); -     //Initial length is 1 -      for(inti =0; I <= N; i++) -     { -Sa[i] =i; inRank[i] = i < n? S[i]:-1; -     } to  +      for(k =1; K <= N; K *=2) -     { theSort (sa,sa+n+1, COMP_SA);//Two-keyword quick-line *  \$         //temporarily store the new calculated rank in TMP and return to rankPanax Notoginsengtmp[sa[0]] =0; -          for(inti =1; I <= N; i++) the         { +Tmp[sa[i]] = tmp[sa[i-1]] + (COMP_SA (sa[i-1],sa[i])?1:0); A         } the          for(inti =0; I <= N; i++) +         { -Rank[i] =Tmp[i]; \$         } \$     } - } -  the intMain () - {Wuyi     stringS ="Abracadabra"; the     int*sa =New int[S.size () +1]; - Suffixarraymatch (s,sa,t); Wu      Delete[] sa; -SA =NULL; About}`
View Code

Of course the first contact people see this code must not know this is what ghost, suggest can find a few blog code comments (there are many great God's comments very meticulous very good), coupled with manual simulation understanding, see a few times will certainly be enlightened ~;

Definition (Understanding!) ）：

Suffix array suffix_array: sa[]

An array of all suffixes of a string sorted by dictionary order;

Sa[i] = k means that the sub-string of small I after ordering is s[k ... n] (take up the string subscript 1~n for good understanding);

And finally we are going to reach the SA array state as shown in the following example (excerpt from Juno PPT):

Rank array: rank[]

Save suffix s[i ... n] rank in the sort;

Rank[i] = k means substring s[i ... n] in the dictionary ordering of all suffixes for the first k small;

Figure 1 shows that sa[1] = 4, rank[4] = 1; That is, the SA array and the rank array are reciprocal relations ;

Calculation of suffix arrays

The basic idea of algorithm- - multiplication

What does multiplication mean?

The order of the substrings with a length of 1 at the beginning of each position is computed, and the order of the substrings of length 2 is computed, and the order of the substrings of length 4 is calculated by using the sequential results of the substring of length 2. Constant multiplication, knowing that the length is greater than or equal to the original string length, the suffix array is obtained;

To calculate the order of substrings of length 2, simply sort the two-character pairs .

For example, the original string Aaba (the subscript of the following letters represents the position of the letter in the original string)

A1=A2=a4 < B3, then a1a2 must be less than a2b3

The order of substrings of length 2k is required, as long as you know the order of the substrings of length K.

such as the original string Aabac

A1A2 < a2b3 < a4c5< b3a4, then a1a2 b3a4 must be less than a2b3 a4c5

Note RANKK (i) for S[i, K] (a substring of length starting from I) is the lesser of the substrings of all sequenced lengths of K;

To calculate the order of substrings of length 2k, simply sort the pairs of two rank. Comparison of pairs of rankk (i) with rankk (i+k) and Rankk (j) vs. RANKK (j+k) (two-element comparison) instead of a direct comparison of S[i, 2k] and s[j, 2k]. The comparison between Rankk (i) and Rankk (j) is comparable to S[i, K] and S[j, K], compared to RANKK (i+k) and RANKK (j+k) equivalent to S[i+k, K] and S[j+k , K].

Initialize: sa[i] = i;

Rank[i] = s[i]; The initial time is to sort a single character in the string, so you can directly start rank as the ASCII code of the character, note that at this point rank is not the actual sense of the order, only the relative sort, that is s[i]>s[j] rank[i] > rank[j] just;

It is also important to note that there is a small trick to working with strings, which is to define Sa[n] as 1, so that the rank value can be ranked starting from 1.

k = 0, initialized, get the sort of s[i, 1];

Sa[0]: 0 Rank[0]: 97
SA[1]: 1 Rank[1]: 98
SA[2]: 2 rank[2]: 114
SA[3]: 3 Rank[3]: 97
SA[4]: 4 Rank[4]: 99
SA[5]: 5 Rank[5]: 97
SA[6]: 6 rank[6]: 100
SA[7]: 7 Rank[7]: 97
SA[8]: 8 rank[8]: 98
SA[9]: 9 rank[9]: 114
SA[10]: rank[10]: 97
SA[11]: rank[11]:-1

k = 1; Get the sort of s[i, 2];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 0 AB rank[0]: 2
SA[3]: 7 AB rank[7]: 2
SA[4]: 3 AC rank[3]: 3
SA[6]: 1 BR rank[1]: 5
SA[7]: 8 BR rank[8]: 5
SA[8]: 4 CA rank[4]: 6
SA[9]: 6 da rank[6]: 7
SA[10]: 2 ra rank[2]: 8
SA[11]: 9 RA rank[9]: 8

K = 2; Get the sort of s[i, 4];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 0 Abra rank[0]: 2
SA[3]: 7 Abra rank[7]: 2
SA[6]: 8 bra rank[8]: 5
Sa[7]: 1 BRAC rank[1]: 6
SA[9]: 6 DABR rank[6]: 8
SA[10]: 9 RA rank[9]: 9
SA[11]: 2 Raca rank[2]: 10

K = 4; Get the sort of s[i, 8];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 7 Abra rank[7]: 2
SA[6]: 8 bra rank[8]: 6
SA[9]: 6 Dabra rank[6]: 9
SA[10]: 9 ra rank[9]: 10

K = 8; Get the sort of s[i, N];

SA[0]: rank[11]: 0
SA[1]: Ten a rank[10]: 1
SA[2]: 7 Abra rank[7]: 2
SA[6]: 8 bra rank[8]: 6
SA[9]: 6 Dabra rank[6]: 9
SA[10]: 9 ra rank[9]: 10

This gets the SA array;

For the ordering of strings using the Double keyword fast, the complexity is O (NLOGN), each calculation suffix s[i ... n] rank value, the suffix is compared with the previous suffix sorted, if equal rank value is the same, or rank value plus 1;

`1 //temporarily store the new calculated rank in TMP and return to rank2tmp[sa[0]] =0;3  for(inti =1; I <= N; i++)4Tmp[sa[i]] = tmp[sa[i-1]] + (COMP_SA (sa[i-1],sa[i])?1:0);5 6  for(inti =0; I <= N; i++)7Rank[i] = Tmp[i];`

Finishing O (NLOGN) algorithm later

"Suffix array" "It's hard to understand."

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.