Link: http://imlazy.ycool.com/post.1861423.html
If the sum of the lengths of all strings is l, the average efficiency o (L * logl) of the algorithm described below may be multiplied by O (l) in the worst case ), l is the average length of each string.
First, for each string, extract the substring starting with each character and ending with the end of the string. For example, the string "ACB" contains "ACB", "CB", and "B ". If the total length of all strings is l, there is a total of L substrings. We store these substrings in an array named sub. (Note: it is best to use C-style characters, so that you can directly reference the first address of each substring without having to store these substrings in another way .)
The next step is to sort the substring by time. If you use quick sorting, the time is O (L * logl ). For example, there are three stringsACB","CBBAndDcba", The substrings retrieved from them are:"ACB","CB","B","CBB","Bb","B","Dcba","CBA","BaAndA". The result after sorting them is as follows (to make it clear, I stick the string firmly ):
Sub: 0 1 2 3 4 5 6 7 8 9 A A B cCD C A BBC BBB A |
But there is a problem here. Every comparison in this sort is a string comparison. Will it take a lot of time. In fact, the time for comparing two strings depends on the number of characters at the beginning of the two strings are the same. If the number of characters at the beginning is the same, the comparison will soon be finished. Assume that each character in a string is one of 26 English letters, the probability of the two strings starting with one character is 1/26, and the probability of the two characters being the same is 1/
262. After learning probability theory, we know that the expected value of the same number of characters starting with two strings is 1/26 + 2/26.2 +
3/263... <1. That is to say, it is constant to compare the average time of two strings.
However, the above is just an average. In the most extreme circumstances (although the probability is not nearly impossible, the people who answer the question will certainly produce this extreme data ), it is possible that all strings contain only the same characters (for example, "aaaa", and the largest public substring ), when comparing each two substrings in the sorting algorithm, you must at least read one of the substrings from the beginning to the end. The time order is about the average length of all strings, that is, in the beginning of this article, the time complexity will rise to O (L
* L * logl ).
Sort all the substrings in order, so they are close to the answer. It is hard to imagine that the maximum public substring we require must be the maximum public prefix of several adjacent items in the array sub. For example, in the above example, the largest public substring is "CB", which is the maximum public prefix of the subscript 6, 7, and 8 in the array sub.
In fact, the array sub must not only store the first address of each substring, but also store the original strings of each substring (represented by color in the above example ), this is required in the last step.
Next is the final search step: in the array sub, for each segment adjacent and overwrite all elements of the original string (and as long as the minimum segment, that is, remove any element at the beginning and end of the segment, and it cannot overwrite all the original strings.) obtain the maximum public prefix of the two elements at the beginning and end of the segment, that is, a common substring of all original strings is found. By enumerating all required segments, you can find the maximum public substrings of all original strings. In the previous example, the following sections meet the requirements: [0, 2], [2, 4], [3, 5], [4, 6], [5, 7], and [6, 8]. After comparison, in the section [6, 8], we found the largest public substring "CB" we requested ".
The time spent on this step is O (L). Although the specific process is not difficult, it is a little troublesome to use the text, so please refer to the code below. It can be seen that this step takes less time than the total O (L * logl.
This is the end of the lecture. In
11107 has a similar problem, although a little different, but the truth is the same. In order to make up for the last step I did not clarify above, the code for my question is attached below.
# Include <cstdio> # Include <cstdlib> # Include <cstring>Using namespace STD; Const int max_len= 1004; // notice, the test data is wrong. // The bound is 1004 but not 1000. Const int max_str= 100; Struct substr { Const char * ADDR; Int num; }; Char g_str [max_str] [max_len + 1]; Int g_strcnt; Substr g_substr [max_str * max_len]; Int g_substrcnt; Int substrcmp (const void * a, const void * B ){ Return strcmp (const substr *) A)-> ADDR, (const substr *) B)-> ADDR ); } Int commonlen (const substr & A, const substr & B ){ Const char * I = A. ADDR; Const char * j = B. ADDR; Int Len = 0; While (* I & * J & * I = * j ){ Len ++; I ++; J ++; } Return Len; } Void printstr (const char * STR, int Len ){ For (INT I = 0; I <Len; I ++ ){ Printf ("% C", * Str ); STR ++; } Printf ("\ n "); } Void initsubstr (){ G_substrcnt = 0; For (INT I = 0; I <g_strcnt; I ++ ){ For (const char * j = g_str [I]; * j; j ++ ){ G_substr [g_substrcnt]. ADDR = J; G_substr [g_substrcnt]. num = I; G_substrcnt ++; } } Qsort (g_substr, g_substrcnt, sizeof (substr), substrcmp ); } Int findlongest (){ Int longest = 0; Substr * head = g_substr; Substr * tail = g_substr; Const substr * end = g_substr + g_substrcnt; Int half = g_strcnt/2; Int covercnt = 0; Int cover [max_str]; Memset (cover, 0, sizeof (cover )); While (Head! = END ){ // To find every pair of head and tail, // That in the range [tail, head] There are exactly half + 1 // Strings are covered. While (covercnt <= half & head! = END ){ If (cover [head-> num] = 0 ){ Covercnt ++; } Cover [head-> num] ++; Head ++; } While (covercnt> half ){ Cover [tail-> num] --; If (cover [tail-> num] = 0 ){ Covercnt --; } Tail ++; } If (covercnt = half ){ Int Len = commonlen (* (tail-1), * (Head-1 )); If (LEN> longest ){ Longest = Len; } } } Return longest; } // The work flow of this function is just like "findlongest ()". Void printcommon (INT longest ){ Const substr * head = g_substr; Const substr * tail = g_substr; Const substr * pre = NULL; Const substr * const end = g_substr + g_substrcnt; Int half = g_strcnt/2; Int covercnt = 0; Int cover [max_str]; Memset (cover, 0, sizeof (cover )); While (Head! = END ){ While (covercnt <= half & head! = END ){ If (cover [head-> num] = 0 ){ Covercnt ++; } Cover [head-> num] ++; Head ++; } While (covercnt> half ){ Cover [tail-> num] --; If (cover [tail-> num] = 0 ){ Covercnt --; } Tail ++; } If (covercnt = half ){ Int Len = commonlen (* (tail-1), * (Head-1 )); If (LEN = longest & (Pre = NULL | Commonlen (* (tail-1), * pre) <longest ) ){ Printstr (tail-1)-> ADDR, longest ); Pre = tail-1; } } } } Bool input (){ Bool hasnext = false; Scanf ("% d", & g_strcnt ); If (g_strcnt> 0 ){ Hasnext = true; For (INT I = 0; I <g_strcnt; I ++ ){ Scanf ("% s", g_str [I]); } } Return hasnext; } Void solve (){ Initsubstr (); Int Len = findlongest (); If (LEN = 0 ){ Printf ("? \ N "); } Else { Printcommon (LEN ); } } Int main (){ Int CNT = 0; While (input ()){ If (CNT> 0 ){ Printf ("\ n "); } Solve (); CNT ++; } Return 0; } |