Suffix Array Summary

Source: Internet
Author: User
Tags arrays


suffix array http://blog.csdn.net/u013480600/article/details/44763865

A suffix array is an array of all suffixes of a text string that is emitted from small to large in dictionary order. Detailed introduction See Rujia "Algorithm Contest training Guide".

The AC automaton can deal with the text matching problem of multi-template, and the suffix array can also deal with the text matching problem of multi-template. So what's the difference between them?

ac automata need to know all the templates in advance , and then for a ( online input ) text string for multi-template matching, that is, the template must be all beforehand know, need to match the text can be entered dynamically.

The suffix array needs to know the entire text string beforehand , the template can be one of the dynamic input. In practice, you are often unable to know in advance which template to query (such as a search engine). If you are looking for an article (or articles) in which there is no phrase ( template ), you can first preprocess the text, calculate its suffix array, and then use the phrase ( template ) you entered to find the suffix array of the text in two points ( Because all suffixes are already sorted in dictionary order, the time complexity of the end of the phrase (the template ) can be known through O (Mlogn) (n is the length of the text, M is the template length, and the Algorithm of O (M+LOGN) time complexity is also described later). ( If you use KMP to find a match, the complexity is O (n+m), which is too high for the length of the text string (n) greater than the template string length m )

The code for the suffix array is given below:

suffix Array (annotated version)

[CPP]View plain copy #include <cstdio> #include <cstring> #include <algorithm>usingnamespaceStdConstintmaxn=20000+1000;structSuffixarray {//Save the original string + ' + ' after the string is formed//i.e. the original string in S is represented in the [0,n-2] range,//Then s[n-1] is actually an artificially added '% ' characterCharS[MAXN]; Rank (suffix) array, sa[i]==j indicates that the suffix of dictionary order i is suffix j//Where I from 0 to n-1,j from 0 to n-1 rangeintSA[MAXN]; Rank Array, rank[i]==j means the dictionary rank of suffix i is jintRANK[MAXN];intHEIGHT[MAXN]; Auxiliary arrays for x and y arraysintT1[MAXN],T2[MAXN]; C[I]==J indicates that the keyword <=i has a key word of jintC[MAXN]; s original string + '% ' characters after the length//due to the addition of the tail 0, so n is generally >=2intn;//n>=2, cannot be equal to 1, otherwise the build_height () function may have an int value of any character that appears in the bug//m greater than s[] ArrayvoidBuild_sa (intm) {intI,*x=t1,*y=t2;           Preprocessing the prefix of 1 for each suffix, find the x array and the SA array//At this time x[i]==j represents the absolute value of the I-character (can be regarded as a rank array)//But it is possible x[1]=2, and x[3]=2, stating that 1 characters and 3 characters are exactly the same.           At this time the calculated sa[i]==j represents the current length of 1 string of the rank array,//the rank array value will not be the same//even if x[1]==x[3]==2, but sa[1]=1, and sa[2]=3. That is, even if the 1th character and the 3rd character are exactly the same,//But the 1th place in the ranking is the 1th character, the 2nd is the 3rd character for(i=0;i<m;i++) c[i]=0; for(i=0;i<n;i++) c[x[i]=s[i]]++; At this point c[i] means keyword <=i keyword a total of c[i] for(i=1;i<m;i++) c[i]+=c[i-1]; Calculates the rank array of the current length (1) for(i=n-1;i>=0;i--) sa[--c[x[i]] = i;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.