Seen from the Hihocoder
http://hihocoder.com/problemset/problem/1015
#1015: KMP algorithm time limit: 1000ms single point time: 1000ms memory limit: 256MB description
Small hi and small ho is a pair of good friends, born in the information society, they have a great interest in programming, they agreed to help each other, in the programming of learning along the road together.
This day, they met a river crab, so the crab to small hi and small ho put forward the classic question: "Small hi and small ho, you can not judge a paragraph of text (the original string) inside is not so some ... Special...... The text (pattern string)? "
Small hi and small ho thought carefully, think can only think of very simple practice, but also think that since the crab said, it is certainly not so easy to let them answer, so they can only say: "Sorry, Mr. Crab, we can only think of time complexity for (text length * Special text total length) method, That is, for each pattern string separate judgment, and then enumerate the starting position and check whether it can match, but this is not the way you want it? ”
Crab nodded, said: "It seems your level has yet to be improved, so, if I say only a special text , you can do it?" “
Little Ho was a little dizzy at this time, but little hi quickly opened his mouth and said, "I know!" This is a classic pattern matching problem! Can be solved using the KMP algorithm ! “
The crab satisfied nodded, to small hi said: "Since you know to do, you go to the Small Ho Church, next week I have important tasks to you!" “
"Guaranteed to complete the task!" "Little hi nodded."
Tip One: The idea of KMP
Small hi and small ho back to the school, in order to complete the great mission of the crab entrusted, small hi immediately the small ho caught the computer room began class.
"Little ho, you take a look at the original string and pattern string ~" Little Hi said, handing over a note.
Original string: |
Bababababababababb |
Pattern string: |
Bababb |
"Well, in this case, the pattern string Bababb in the original string where the 13th character began ," Little Ho looked, and replied.
"We assume that we still use the most common method of judging, that is, we first enumerate a starting position in the original string, and then determine whether the string starting from this position can match the pattern string." "Little Hi said," and then we look at the process there is nothing to reduce the amount of computation. ”
"Good!" Small Ho nodded.
"You see, at the start point of 1, the match to the 6th character of the time the failure occurred, this time we should do is not to move the pattern string to the right, and then from the beginning to judge, like this?" "Little Hi again drew a picture on the paper and handed it to Little ho." “
Original string: |
Bababababababababb |
Pattern string: |
Bababb |
Original string: |
bababaBabababababb |
Pattern string: |
bAbabb |
"Yes, and then we found out that the first person could not match. Little Ho answered honestly.
"Then we move the pattern string right one bit, and then we start from scratch, this time we successfully crossed the original string of the 7th character, the 8th character produced a different." "Little hi continues to deduce."
Original string: |
Bababababababababb |
Pattern string: |
Bababb |
And then the plot is very similar, either the last character match is unsuccessful, or the first character is not successful, until the last opportunity to match the success. "Little Ho made a summary.
"Do you think there is nothing to calculate in this process?" "Little Hi asked."
"I think so, you see this line. Small Ho drew a line in a position on two strings.
Original string: |
Babab | aBabababababb |
Pattern string: |
Babab | b |
Well ”
"This is the first time that we have a character mismatch, and then there will be one of two things in the process: one scenario is that the pattern string and the original string's pair of alignment (the starting position in the original string of the enumeration) crossed this line and still failed to match the success, The other case is that the character in the original string matches the character of a position in the pattern string. "We don't think about the first case, but look at what happens in the second situation," Little ho analyses. “
Original string: |
Babab | aBabababababb |
Pattern string (alignment point =1): |
Babab | b |
Pattern string (alignment point =3): |
Bab | A |
"Don't you see, little Ho, you've become smart today!" ~ "Little Hi heartily praised the way."
"Of course, after all, I recently answered a lot of questions in the discussion area, this is very good exercise people!" Little Ho smiled and answered.
"Then I have to show, then I say, anyway, you must have thought of so much, right?" "Little Hi is also a little Ho's in the background, so to say." So little ho nodded, so little hi went on.
"I believe a very easy to notice fact is that if I use I to denote the position of the original string and the pattern string (the position on the pattern string, note! This and alignment point is not the same thing, one on the original string, one on the pattern string, with J to match the position I on the difference between the character and the pattern string alignment points moved to the position, we will find that the pattern string [1, i-j] This paragraph and [j, I-1] This paragraph is the same. For example i=6,j=3 in this example, we will find that the pattern string [1, 3] and [3,5] are the same. "Little Hi organized the next idea, as said."
Original string: |
BA | Bab | a Babababababb |
Mode string (I=1): |
BA | Bab | b |
Mode string (i=3): |
| Bab | A |
"And we will also find that only in the existence of a length k, so that the pattern string [1, I-k] and [K, I-1] The two sections of the same case, the pattern string to the position k, in order to ensure that the original string and pattern string matching process can enter into the original string position I is the same as the corresponding character of the pattern string In other cases, there will be inconsistencies in the judgment of not entering position I at all. "Say Little hi and throw another proposition."
"I'm starting to get a little dizzy!" Little Ho made a protest.
"Then you should read what I just said!" Then you can calculate the sample on the draft paper, and you will get the result soon! "said Little hi. "All we need now is a figure of what the length k is, and we're going to calculate this value for each position I of the pattern string." "And that's the most important point in KMP.--next array.
Hint two: use of next array
"So, to be able to fully understand the next array, let's review how to use the next array ~" Little hi put on the look of a teacher, said. "Let's start by giving a mathematical definition of the next array ~"
Next[0] = -1next[i] = max{0<=k< I | str.substring (1, k) = = str.substring (i-k +1, i)} where Str.substring (i, j) indicates Str Substring from position I to position J, if i>j, substring is empty
"Then we solve the pattern string in the previous example and we can get the next array." "Little Hi wrote and wrote on the paper, painted and painted."
Pattern string: |
b A B a b b |
NEXT: |
0 0 1 2 3 1 |
"Then look at how this next array is used!" to show you all the use of next, we'll change the original string. Then first, we first match, if using the Ori to represent the original string, with par for the pattern string, p for the original string subscript (starting from 1), with Q to denote the pattern string subscript (starting from 1), you will find the most match to p=5, q=5 can not be matched, because at this time ori[p +1] Not equal to Par[q + 1]"Little hi in order to make the instructions more concise, first a bunch of definitions.
"Good!" Little hi teacher is very good! Little Ho was fanning his way.
Original string (p=5): |
Babab | aBcbababababb |
Mode string (q=5): |
Babab | b |
"At this point, make q = Next[q], and align ORI[1..P] and PAR[1..Q], you will find ORI[1..P] and par[1..q] are still one by one corresponding. “
Original string (p=5): |
Babab | Abcbababababb |
Mode string (q=3): |
Bab | Abb |
"At this point, ori[p+1] and par[q+1] the same, so you can continue to match, but to the p=7,q=5 time and found that can not be matched. ”
Original string (p=7): |
Bababab | CBababababb |
Mode string (q=5): |
Babab | b |
"At this point, q = Next[q], and will ORI[1..P] and PAR[1..Q] alignment, you will find ORI[1..P] and PAR[1..Q] is still one by one corresponding, this is the same as before. ”
Original string (p=7): |
Bababab | CBababababb |
Mode string (q=3): |
Bab | aBB |
"At this time, ori[p+1] and par[q+1] are still not the same, so they have to make q=next[q]. ”
Original string (p=7): |
Bababab | CBababababb |
Mode string (q=1): |
B | aBabb |
"At this time, ori[p+1] and par[q+1] are still different, so q=next[q]. ”
Original string (p=7): |
Bababab | CBababababb |
Mode string (q=0): |
| bAbabb |
"At this time, ori[p+1] and par[q+1] are still different, so q=next[q]. ”
Original string (p=7): |
Bababab | CBababababb |
Mode string (q=-1): |
| Bababb |
"At this point, it is equivalent to what we said earlier that the alignment of the pattern string and the original string (that is, the starting position in the original string of the enumeration) crossed the line (that is, the line at the right of c), in which case p and Q should both be +1 and then proceed with the previous operation. "Little hi rubbed a sweat," said.
"So I can roughly understand how the next array is used to solve the pattern-matching problem, but how does it work?" he said. Is the general approach not to O (the cubic length of the pattern string)? Little Ho asked.
"That's what I'm going to tell you next!" "Little hi smiled:" But let me drink saliva first! ”
Tip three: How to solve next array
"First we don't want to ask for the entire next array, but suppose we already know the next[1..4 of the pattern string in the previous example], how about next[5]? "Little Hi," suggested the Tao.
"Good!" So we just need the square-level algorithm to figure out its value! Little Ho is happy.
"A bit of a pursuit of good or bad!" Little hi took a deep breath: "What's the difference between you and the previous solution?" ”
"There doesn't seem to be any. Then what do you say? I've got a paste in my head anyway. "Little ho depressed Way."
What if we took par.substring (1, 5) as a new ori_new, and then took Par.substring (1, 4) as a new pattern string par? "Little hi smiled."
"Will. I'll give it a try! Small Ho took the small hi hand of paper and pen, then began to calculate: "First directly matched to the p=4, q=4 situation, this time strictly speaking has been completed, but certainly not so end, at this point Par_new[q +1] because is the empty character, so definitely and ori_new[p+1] Match is not up. So q = next[q] "
Original string (p=4): |
Baba | b |
Mode string (q=4): |
Baba | |
Original string (p=4): |
Baba | b |
Mode string (q=2): |
BA | b |
"Then this time Ori_new[p + 1] on the direct and PAR_NEW[Q + 1] match up, so the new p=5,q=3, Murphy ... This last Q is next[5]!. Little Ho suddenly had a flash of inspiration.
"Yes, that's it!" Then you think about how to ask for next[6 now]. "Little hi continues to guide little ho."
"First we don't need to start the match again, just add the 6th character after the original string and the pattern string." "Little Ho Analysis said.
Original string (p=5): |
Babab | b |
Mode string (q=3): |
Bab | aBB |
"Unable to continue matching, so q=next[q]. “
Original string (p=5): |
Babab | b |
Mode string (q=1): |
B | aBabb |
"Still can't continue to match, so make q=next[q]." “
Original string (p=5): |
Babab | b |
Mode string (q=0): |
| bAbabb |
"At this point can match, the new p=6,q=1, so next[6] is 1!" Little Ho said, "I didn't think the next array itself would be solved in a recursive way, it's so ingenious!" “
"Then you want to hurry to write code, KMP algorithm code but can write very short very clever oh!" ~ "Little Hi," suggested the way.
Good “
Input
The first line, an integer n, represents the number of test data groups.
The next n*2 line, each of the two lines represents a test data. In each test data, the first behavior pattern string consists of no more than 10^4 uppercase letters, the second behavior of the original string, consisting of no more than 10^6 uppercase letters.
where n<=20
Output
For each test data, output a line of ans in the order in which they appear in the input, indicating the number of times the pattern string appears in the original string.
-
Sample input
-
5HAHAHAHAWQNWQNADAADADADABABABBBABABABABABABABABBDADADDAADAADDAAADAAD
-
Sample output
-
31310
The Kmpmatcher function returns a vector in which the elements are matched to a successful position. The use of the int type is likely to overflow, is considering the solution.
After the use of the post here for reference, so as not to use later when the code can not be found, but also to make suggestions to improve before the beginning of the novice wrote not strong enough code.
By the way the blog how to use??。
#1015: KMP algorithm