The original: A daily walkthrough of the classic Algorithm--the seventh KMP algorithm
In the university, should be in the data structure have seen the KMP algorithm, do not know how many teachers to the algorithm is a stroke, at least we were before,
Really KMP algorithm is still a bit forgive, if the red black tree is a perverted class, then KMP algorithm than red black tree also perverted, sorry, every time you hit KMP, lose
Into the law always prompt "look at the pornography" Three words, hey, it is called "See the algorithm" bar.
One: BF algorithm
If you write a string pattern match, you may soon write the naïve BF algorithm, at least the problem is solved, and I think it is clear to everyone that it's time
The impurity is O (MN), the reason is very simple, when the main string and the pattern string mismatch, we always compare the first bit of the pattern string with the next character of the main string, so complex
The degree of high in the main string every time the mismatch is to backtrack, the figure I omitted.
Two: KMP algorithm
Just now we also said that the main string to backtrack every time, thereby increasing the complexity of time, then can not be in the "main string" and "pattern string" mismatch, the main string does not backtrack it?
Instead, let the "pattern string" slide a certain distance to the right, and then proceed to the next round after the number, so that the time complexity is O (m+n)? So the KMP algorithm is
To handle this, let's look at a simple example.
With this diagram, let's discuss its general reasoning, assuming that the main string is s, the pattern string is P, and in si! = PJ, we can see that the following relationship is met
Si-jsi-j+1...sn-1=p0p1. Pj-1. So how much distance should the mode p slide to the right? That is, is the I character in the main string compared to which character in the pattern string?
Suppose it should be compared to the position of k in the pattern string, if there is a maximum prefix true substring and suffix true substring in the pattern string, then there is p0p1. Pk-1=pj-kpj-k+1...pj-1.
In other words, in mode p, the first k characters are the same as the K characters before J characters, for example, the maximum prefix of "Abad" is "ABA", the maximum
Suffix true substring is "bad", of course, here is not equal, here 0<k<j, we hope K close to J, then we will slide the minimum distance, OK, now we use
NEXT[J] to record the mismatch when the pattern string should be compared to SI with which character.
Set Next[j]=k. According to the formula we have
-1 when J=0
NEXT[J] = max{k| 0<k<j and p0p1. PK-1=PJ-KPJ-K+1...PJ-1}
0 other conditions
OK, the next question is how to find out next[j], this is the core of KMP thought, for NEXT[J], we use recursive method, now we know
Next[j]=k, we're here to beg next[j+1]=? The problem? In fact, there are two kinds of situations:
①:PK=PJ p0p1 ... PK=PJ-KPJ-K+1...PJ, then we know:
Next[j+1]=k+1.
And because of the next[j]=k,
Next[j+1]=next[j]+1.
②:PK!=PJ p0p1 ... PK!=PJ-KPJ-K+1...PJ, this situation we have a bit of egg pain, in fact, here we will be the pattern string matching problem into the above we mentioned
In the "main string" and "pattern string" to find the next question, you can understand the pattern string in the prefix string and suffix string to find next[j] problem. Now our train of thought is certain
To find this K2, make PK2=PJ, and then K2 into the ① will be able to.
Set K2=next[k]. Then there are p0p1 ... Pk2-1=pj-k2pj-k2+1...pj-1.
If PJ=PK2, then next[j+1]=k2+1=next[k]+1.
If PJ!=PK2, you can continue to use next recursively as above until there is no K2.
OK, below we on the code, may be a bit around, whether you understand or not, anyway, I understand.
1 usingSystem;2 usingSystem.Collections.Generic;3 usingSystem.Linq;4 usingSystem.Text;5 6 namespacesupportcenter.test7 {8 Public class Program9 {Ten Static voidMain (string[] args) One { A stringZstr ="ABABCABABABDC"; - - stringMSTR ="BABDC"; the - varindex =KMP (Zstr, MSTR); - - if(Index = =-1) +Console.WriteLine ("there are no matching strings! "); - Else +Console.WriteLine ("haha, find the character, the position is:"+index); A at Console.read (); - } - - Static intKMP (stringBIGSTR,stringsmallstr) - { - inti =0; in intj =0; - to //next that calculates "prefix string" and "suffix string " + int[] Next =Getnextval (SMALLSTR); - the while(I < Bigstr. Length && J <Smallstr. Length) * { $ if(j = =-1|| Bigstr[i] = =Smallstr[j])Panax Notoginseng { -i++; theJ + +; + } A Else the { +j =Next[j]; - } $ } $ - if(J = =Smallstr. Length) - returnISmallstr. Length; the - return-1;Wuyi } the - /// <summary> Wu ///p0,p1....pk-1 (prefix string) - ///pj-k,pj-k+1....pj-1 (suffix string) About /// </summary> $ /// <param name= "Match" ></param> - /// <returns></returns> - Static int[] Getnextval (stringsmallstr) - { A //prefix string start position ("1" is convenient to calculate) + intK =-1; the - //suffix string start position ("1" is convenient for calculation) $ intj =0; the the int[] Next =New int[Smallstr. Length]; the the //according to the formula: J=0, Next[j]=-1 -NEXT[J] =-1; in the while(J < Smallstr. Length-1) the { About if(k = =-1|| Smallstr[k] = =Smallstr[j]) the { the //situation of PK=PJ: next[j+1]=k+1 = next[j+1]=next[j]+1 theNEXT[++J] = + +K; + } - Else the {Bayi //PK! = PJ's situation: we recursive k=next[k]; the //either find it, or k=-1 abort. theK =Next[k]; - } - } the the returnNext; the } the } -}
The daily walkthrough of the classic Algorithm problem--the seventh problem KMP algorithm