Suffix Array (i)--hiho120 longest overlapping repetition K second son string

Source: Internet
Author: User
Tags arrays comparison repetition stringbuffer

I read the Hihocoder topic and the explanation after finishing this article


Problem Analysis

This problem is referred to as the "longest overlapping K-second string problem", which asks for the maximum length of all substrings that meet the requirements: the substring repeats at least k times in the string, where the substrings can overlap (partially).

The solution is given in the tip of the original problem solving method, using the suffix array suffix and a height array, and both arrays have an efficient solution algorithm. suffix array suffix and height arrays

suffix array : Records an array of all suffixes and is ordered. Can be used to solve problems with single-string problems, two string problems, and multiple strings.
e.g. string banana$, ($ denotes the end of the string), suffix (p) denotes the suffix (suffix p) from the beginning of the original string to the end of the string, Rank[p] represents the "rank" of the suffix p from small to large in all suffixes, and the ordered array is denoted as SA

b a N a N a $
1 2 3 4 5 6 7
tr> Td>rank[1]=5
i suffix suffix (p) sa[i]=p rank[p] heigh T[i]
1 $ 7 rank[7]=1 x
2 a$ 6 rank[6]=2 0
3 ana$ 4 rank[4]=3 1
4 anana$ 2 rank[2]=4 3
5 banana$ 1 0
6 na$ 5 rank[5]=6 0
7 nana$ 3 rank[3]=7 2

The

height array : height[i] is the longest common prefix length for suffix (sa[i-1]) and suffix (sa[i]), which is the longest common prefix length for the two suffixes that are ranked next to each other. For example Height[4] is anana$ and ana$ the longest public prefix, that is, Ana, length is 3. Height is also attached to the table above. The
Heigh array has two properties, which is useful for optimizing the calculation of height. If RANK[J] < rank[k], then the suffix Sj. N and Sk. The longest public prefix for n is
Min{height[rank[j]+1],height[rank[j]+2]...height[rank[k]]}. The nature of the
is obvious because we have suffixes sorted in dictionary order. height[rank[i]]≥height[rank[i-1]]-1
a suffix suffix (i-1) is selected, and its previous suffix is recorded as suffix (k), then their longest public prefix is height[rank[i-1]].
① If height[rank[i-1]]≤1, then height[rank[i-1]]-1≤1-1 = 0≤
Height[rank[i]].
② if height[rank[i-1]] >1, then suffix (k+1) will be ranked in front of suffix (i), height[rank[i-1]] at least 2, then suffix (k), suffix (i-1) At least the first 2 letters are the same, starting with the third letter to match the dictionary order, then suffix (k+1), suffix (i), at least the first 1 letters are the same, followed by the dictionary order, that is suffix (k+1) will be ranked in front of suffix (i), and the longest common prefix for both is height[rank[i-1]]-1. So suffix (i) and the longest common prefix of the suffix in its previous name is at least height[rank[i-1]]-1 (suffix (k+1) and suffix (i) may have other suffixes that can only make the public prefix larger or the same) problem conversions

The topic requires the longest overlap repetition K second son string, has the height array this question convenient many.
A repeating substring is a common prefix of two suffixes, the longest repeating substring, and the maximum value of the longest common prefix of the two suffixes. (The longest public prefix must be obtained from the adjacent suffix)
The longest overlapping repetition K second string is converted to the minimum value of a subsequence with the largest length k in the height array .
The original "Small hi: haha." Good The problem after conversion is too easy for me to do with a monotonous queue or two points. " solving of suffix arrays

If the suffix array is sorted, the string length is n, the array has n items, using the fast average to O (N*LGN) comparison, the string comparison time is not constant, is O (n), the overall complexity is O (N*N*LGN), N large time method is not used, multiplication algorithm complexity is O (N*LGN), The complexity of the DC3 is O (N)
There are many ways to find the suffix array, the most famous are the two multiplication algorithms and the DC algorithm. The time complexity of DC algorithm is better, but more complex, and the multiplication algorithm is more practical.
The multiplication algorithm idea is: first to find the suffix of the K-prefix rank value, and then according to the value of the 2k-prefix according to the double keyword sequence multiplication algorithm steps:

A string of length 2^0=1, that is, all single-letter sorting.
Double-keyword ordering of strings of length 2^1=2 with a string of length 2^0=1. Given the time efficiency, we generally sort by base.
Use a string of length 2^ (k-1) to sort the string with a length of 2^k in a double keyword.
Until 2^k≥n, or the rank array rank has been ranked from 1 to N, the final suffix array is obtained. solving the height array

Height[rank[i]]≥height[rank[i-1]]-1
According to Height[rank[1]], height[rank[2]] [... height[rank[n]], using the properties of the height array , you can reduce the complexity of time to O (n). This is because the value of the height array is not more than N, and we will only subtract 1 at the end of each calculation, so the total operation will not exceed 2n times. for process-style implementations

void Solve () {for (int i = 0; i <; i + +) cnta[i] = 0;
    for (int i = 1; I <= n; i + +) Cnta[ch[i]] + +;
    for (int i = 1; i <; i + +) Cnta[i] + = cnta[i-1];
    for (int i = n; i; i-) sa[cnta[ch[i]]--] = i;
    RANK[SA[1]] = 1;
        for (int i = 2; I <= n; i + +) {Rank[sa[i]] = rank[sa[i-1];
    if (ch[sa[i]! = ch[sa[i-1]]) Rank[sa[i]] + +;
        } for (int l = 1; Rank[sa[n]] < N, l <<= 1) {for (int i = 0; I <= n; i + +) cnta[i] = 0;
        for (int i = 0; I <= n; i + +) cntb[i] = 0;
            for (int i = 1; I <= n; i + +) {Cnta[a[i] = Rank[i]] + +;
        Cntb[b[i] = (i + L <= N)? Rank[i + L]: 0] + +;
        } for (int i = 1; I <= n; i + +) Cntb[i] + = cntb[i-1];
        for (int i = n; i; i-) tsa[cntb[b[i]]--] = i;
        for (int i = 1; I <= n; i + +) Cnta[i] + = cnta[i-1];
        for (int i = n; i; i-) sa[cnta[a[tsa[i]] [-] = tsa[i]; RANK[SA[1]] = 1;
            for (int i = 2; I <= n; i + +) {Rank[sa[i]] = rank[sa[i-1]; if (A[sa[i]]! = a[sa[i-1]] | |
        B[sa[i]]! = b[sa[i-1]]) Rank[sa[i]] + +;
        }} for (int i = 1, j = 0; I <= n; i + +) {if (j) J--;
        while (Ch[i + j] = = Ch[sa[rank[i]-1] + j]) J + +;
    Height[rank[i]] = j;    }
}
Object-oriented style implementation
Suffix Array using Prefix doubling algorithm * See Also:udi Manber and Gene Myers ' seminal paper (1991): "Suffix arrays: A New method for on-line string searches "* * Copyright (c) Ljs (http://blog.csdn.net/ljsspace/) * Licensed unde R GPL (http://www.opensource.org/licenses/gpl-license.php) * * @author LJS * 2011-07-17 * */public class Prefixdou

    Bling {public static final char Max_char = ' \u00ff ';  
        Class suffix{int[] sa;
        note:the p-th suffix in sa:sa[rank[p]-1];
        P is the index of the array "rank", start with 0; A text s ' s p-th suffix is s[p.
        N], n=s.length-1. 
        Int[] rank;
    Boolean done;
        }//a prefix of suffix[isuffix] represented with digits class tuple{int isuffix;//the p-th suffix
        Int[] digits;
            Public Tuple (int suffix,int[] digits) {this.isuffix = suffix;           
        This.digits = digits;
      } public String toString () {      StringBuffer sb = new StringBuffer ();
            Sb.append (Isuffix);
            Sb.append ("(");
                for (int i=0;i<digits.length;i++) {sb.append (digits[i]);
            if (i<digits.length-1) sb.append ("-");
            } sb.append (")");
        return sb.tostring ();
    }}//the plain counting sort algorithm For comparison//a:input array//b:output array (sorted)
        Max:a value ' s range is 0...max public void Countingsort (int[] a,int[] B,int max) {//init the counter array
        int[] C = new int[max+1];
        for (int i=0;i<=max;i++) {C[i] = 0;
        }//stat the Count in A for (int j=0;j<a.length;j++) {c[a[j]]++;
        }//process the counter array C for (int i=1;i<=max;i++) {c[i]+=c[i-1]; }//distribute the values in A to array B for (int j=a.length-1;j>=0;j--) {//c[a[j]] <= a.length b[--c[a[j]]]=a[j]; }}//d:the digit to does countingsort//max:a value ' s range is 0...max private void countingsort (int d
        , tuple[] ta,tuple[] Tb,int max) {//init The counter array int[] C = new int[max+1];
        for (int i=0;i<=max;i++) {C[i] = 0;
        }//stat the count for (int j=0;j<ta.length;j++) {c[ta[j].digits[d]]++;
        }//process the counter array C for (int i=1;i<=max;i++) {c[i]+=c[i-1]; 
            }//distribute the values for (int j=ta.length-1;j>=0;j--) {//c[a[j]] <= a.length         
        TB[--C[TA[J].DIGITS[D]]]=TA[J]; }}//ta:input//tb:output for rank caculation private void Radixsort (tuple[] ta,tuple[] Tb,int max,in
        T digitslen) {int len = ta.length; int Digitstotallen = Ta[0].digits.length;
            for (int d=digitstotallen-1,j=0;j<digitslen;d--, J + +) {This.countingsort (d, TA, TB, Max);  Assign TB to TA if (j<digitslen-1) {for (int i=0;i<len;i++) {Ta[i] =
                Tb[i];
    }}}}//max is the maximum value in any digit of ta.digits[], used for counting sort Ta:input//tb:the Place holder, reused between iterations private Suffix rank (tuple[] ta,tuple[] Tb,int ma        
        X,int digitslen) {int len = ta.length; 

        Radixsort (Ta,tb,max,digitslen);

        int digitstotallen = Ta[0].digits.length;
        Caculate rank and sa int[] sa = new Int[len];  

        Sa[0] = Tb[0].isuffix;      
        int[] rank = new Int[len]; int r = 1;        
        Rank starts with 1 Rank[tb[0].isuffix] = R;  

            for (int i=1;i<len;i++) {Sa[i] = Tb[i].isuffix; Boolean equallast = trUe
                    for (int j=digitstotallen-digitslen;j<digitstotallen;j++) {if (Tb[i].digits[j]!=tb[i-1].digits[j]) {
                    Equallast = false;
                Break
            }} if (!equallast) {r++;    
        } Rank[tb[i].isuffix] = R;
        } Suffix Suffix = new Suffix ();      
        suffix.rank= rank;
        SUFFIX.SA = sa;
        Judge if we is done if (r==len) {Suffix.done = true;
        }else{Suffix.done = false;

    } return suffix;
    }//precondition:the last char in text must is less than and other chars.
        Public Suffix Solve (String text) {if (text = = null) return null;
        int len = Text.length ();

        if (len = = 0) return null;
        int k=1; Char base = Text.charat (Len-1);
        The smallest char tuple[] TA = new Tuple[len]; tuple[] TB = new Tuple[len];
     Placeholder   for (int i=0;i<len;i++) {Ta[i] = new Tuple (i,new int[]{0,text.charat (i)-base});        
        } Suffix Suffix = Rank (ta,tb,max_char-base,1);
            while (!suffix.done) {//no need to decide If:k<=len k<<=1;
            int offset = k>>1;
                for (int i=0,j=i+offset;i<len;i++,j++) {ta[i].isuffix = i;               
            Ta[i].digits=new Int[]{suffix.rank[i], (J<len)? Suffix.rank[i+offset]:0};
            } int max = suffix.rank[suffix.sa[len-1]]; 
        suffix = rank (ta,tb,max,2);
    } return suffix;
        The public void report (Suffix Suffix) {int[] sa = suffix.sa;
        int[] rank = Suffix.rank;

        int len = sa.length;
        System.out.println ("suffix array:");            
        for (int i=0;i<len;i++) {System.out.format ("%s", Sa[i]);
        } System.out.println (); System.out.priNtln ("Rank array:");          
        for (int i=0;i<len;i++) {System.out.format ("%s", Rank[i]);
    } System.out.println (); } public static void Main (string[] args) {/*//plain counting sort test:int[] a= {2,5,3,0,2,
        3,0,3};
        prefixdoubling PD = new prefixdoubling ();
        int[] B = new Int[a.length];
        Pd.countingsort (a,b,5);
        for (int i=0;i<b.length;i++) System.out.format ("%d", b[i]);
        System.out.println ();
        */String Text = "gacccaccacc#";
        prefixdoubling PD = new prefixdoubling ();
        Suffix Suffix = pd.solve (text);
        System.out.format ("Text:%s%n", text);

        Pd.report (suffix);
        System.out.println ("********************************");
        Text = "mississippi#";
        PD = new prefixdoubling ();
        suffix = pd.solve (text);
        System.out.format ("Text:%s%n", text);

        Pd.report (suffix); System.ouT.println ("********************************");
        Text = "abcdefghijklmmnopqrstuvwxyz#";
        PD = new prefixdoubling ();
        suffix = pd.solve (text);
        System.out.format ("Text:%s%n", text);

        Pd.report (suffix);
        System.out.println ("********************************");
        Text = "yabbadabbado#";
        PD = new prefixdoubling ();
        suffix = pd.solve (text);
        System.out.format ("Text:%s%n", text);

        Pd.report (suffix);
        System.out.println ("********************************");
        Text = "dfdlkjljldfasdlfjasdfkldjasfldafjdajfdsfjalkdsfaewefsdafdsfa#";
        PD = new prefixdoubling ();
        suffix = pd.solve (text);
        System.out.format ("Text:%s%n", text);

    Pd.report (suffix); }
}
Problem Solving Code

A brief reference

Http://hihocoder.com/contest/hiho120/problem/1
https://my.oschina.net/fangshaowei/blog/176330

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.