KMP algorithm for pattern matching

Source: Internet
Author: User

This improved pattern matching algorithm, discovered by D.e.knuth,j.h.morris and V.r.pratt, is abbreviated as the KMP algorithm. Probably learned the knowledge of informatics, is a more difficult to understand the algorithm, today special to make it thoroughly clear. Note that this is an improved algorithm, so it is necessary to take out the original pattern matching algorithm, in fact, the key to understand is here, the general matching algorithm:intIndex (String s,string T,intPos//refer to the program in the data structure{i=pos;j=1;//The 1th element of the string here is Subscript 1 .   while(I<=s.length && j<=t.length) {    if(S[i]==t[j]) {++i;++J;} Else{i=i-j+2; j=1;}//************** (1)  }  if(j>t.length)returnI-t.length;//Match Success  Else return 0;} The process of matching is very clear, the key is when the ' mismatch ' when the program is handled? Backtracking, yes, noticed (1The sentence, why to backtrack, see the following example: S:aaaaabababcaaa t:ababcaaaaabababcaaa ababc. (. Indicates the previous mismatch) The result of backtracking is aaaaabababcaaa a. (BABC) If not backtracking is aaaaabababcaaa ABA.BC this misses a possible match success Aaaaabababcaaa ABABC Why is this happening? This is determined by the nature of the T-string itself, because the T-string itself has before and after'partial Match'of the nature. If T is abcdef, there is no need to backtrack. The improvement of the place is here, we start from the T-string itself, in advance to find out the position of the T itself before and after the partial match, then you can improve the algorithm. If you don't have to backtrack, where does the T-string next position start? Or the above example, T is ababc, if C mismatch, then you can move forward to the ABA last a position, like this: ... ababd ... ababc-ABABC So I do not backtrack, J jump to the first 2 positions, continue to match the process, this is the KMP algorithm is located. This when T[j] mismatch, J should jump forward value is the next value of J, it is determined by the T-string itself, independent of s string. The definition of next value is given in the data structure:0If j=1Next[j]={max{k|1<k<j and'p1...pk-1'='pj-k+1...pj-1'          1Other circumstances I saw this head dizzy, in fact, it is described in the situation I expressed earlier, about next[1]=0 is the rule, so that the rules can make the program simple, if not to be set to other values as long as the value of not and the subsequent conflict is also possible; and what does that Max mean, for example: T:aaab...aaaab ... Aaab-Aaab-Aaab-Aaab like this T, the front part of the matching part of more than two, that should jump forward to the first few? The most recent, that is to say, as far as possible to the right slip the shortest length. OK, knowing here, you see most of the content of KMP, and then the key question is how to find the next value? Regardless of it, first look at how to use it for the matching operation, that is, first assume that there is a next value. Overwrite the first program with the following:intINDEX_KMP (String s,string T,intPOS) {i=pos;j=1;//The 1th element of the string here is Subscript 1 .   while(I<=s.length && j<=t.length) {    if(j==0|| S[i]==t[j]) {++i;++j;}//Notice the j==0 here, and the role of ++j, to know why the next[1]=0 benefits are regulated.    ElseJ=NEXT[J];//I do not change (no backtracking), J beats  }  if(j>t.length)returnI-t.length;//Match Success  Else return 0;} OK, isn't it very simple? There is a simpler, the next value, which is the key to the success of the entire algorithm, from the definition of next value to seek too scary, how to ask? As I said earlier, the next value expresses the nature of the self-matching part of the T-string, so I just need to match the T string and T string itself to find out, the matching process here is not a match from the beginning, but from the t[1] and t[2] Start matching, give the algorithm as follows:voidGet_next (String T,int&next[]) {i=1; j=0; next[1]=0;  while(i<=t.length) {    if(j==0|| T[i]==t[j]) {++i;++j; next[i]=j;/********** (2)*/}    Elsej=Next[j]; }} See if this function is very much like a kmp matching function, yes, it is! Note that (2When the statement logic is covered by t[i]==t[j] and I front, J Front of all match the case, so the first self-increment, and then write down next[i]=j, so that whenever I have self-increment will find a next[i], and J will be less than equals I, so for already come out of next, You can continue to ask for the next, and next[1]=0 is known, so the whole is so recursive to find out, the method is very ingenious. This kind of improvement is already very good, but the algorithm can also be improved, notice the following matching situation: ... aaaa. Aaac. In the T-string'a'and S in the string'C'Mismatch, and'a'Is the next value of the'a', the same comparison will still be mismatch, and such comparisons are superfluous, if I had known beforehand, when t[i]==T[j], that next[i] is set to Next[j], when the next value is already compared, so you can remove such redundant comparisons. Then a little improvement was obtained:voidGet_nextval (String T,int&next[]) {i=1; j=0; next[1]=0;  while(i<=t.length) {    if(j==0|| t[i]==T[j]) { ++i;++J; if(T[i]!=t[j]) next[i]=J; ElseNEXT[I]=NEXT[J];//eliminate the excess possible comparisons, next jump forward    }    Elsej=Next[j]; }} The matching algorithm does not change. To this completely clear, before old think KMP algorithm good mysterious, really not people want to come out, actually, it is just the original algorithm has been improved. Visible on the basis of the classic thing is still very important, you have the ability to ' scrap ' the classic, on the creation of progress.
Reprinted from: http://blog.csdn.net/jixingzhong/article/details/1383135

KMP algorithm for pattern matching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.