KMP algorithm--from beginner to ignorant to understanding

Source: Internet
Author: User

This post references http://blog.csdn.net/v_july_v/article/details/7041827
See http://blog.csdn.net/WINCOL/article/details/4795369 for other string matching algorithms


Brute Force matching algorithm

The idea of violent matching, assuming now that the text string s matches to the I position, the pattern string P matches to the J position, there are:

    • If the current character matches successfully (that is, s[i] = = P[j]), then i++,j++ continues to match the next character;
    • If the mismatch (that is, a s[i]! = P[j]), so i = i-(j-1), j = 0. The equivalent of every match failure, I backtracking, J is set to 0.
KMP Algorithm Overviewthe KMP algorithm is used for string matching, and the core idea is to find repeated occurrences of successive characters in the substring and record them in the array. In this way, the backtracking length after mismatch is reduced to reduce the number of matches.
the KMP algorithm is actually based on a violent match, plus the results after the next array.
see the easy-to-read explanation: Click to open the link (this is the basis for understanding what's behind it)
KMP Algorithm Process
if the text string s is now matched to the I position, and the pattern string P matches to the J position, then there are:
    • If the current character matches successfully (that is, s[i] = = P[j]), then i++,j++ continues to match the next character;
    • If the mismatch (that is, a s[i]! = P[j]), so i = i-(j-1), j = 0. Equivalent to every match failure, J goes back to Next[j]
It can be found that the difference between the KMP algorithm and the Brute force matching algorithm (BF algorithm) is that I do not need backtracking and J backtracking to next[j] in the KMP algorithm.
after reading the popular explanation of the above blog, we probably understand the core idea of the KMP algorithm, and probably can also write a partial matching value table. Now let's figure out the relationship between the next array and the partial match value table. (This table in the blog post here)
The shift formula we use is:
Move digits = number of matched characters-corresponding partial match values
so J moves the number of bits that have been matched for each backtracking-corresponding matching values , because for the same string tosaid,These related information for each characteris fixed,All I need now isPut this information in an array, each timeneed to movetime to directly letJ is equal to the number of mobile digits in the corresponding position, which is not easy to add pleasure. because every timethrough this array Jcan bemove to the next location where it needs to go, so we might as well call it the next group, each The process of the line is j=next[j]
now we need to consider how to construct the next array.
so the problem is, the number of bits we move (i.e. where the next J will go) and the current character match values are not Egg Relationship,it only cares about the horseThe letter that was associated with that character .Information . If it doesn't care about me, then what do I care?So, what we needThe current number of mobile digits is not related to the current character, it is time say goodbye.
However, although the current match value does not have anything to do with the current number of moves, it affects the movement of the next position. number of bits, so we're going to put it down anda position is associated. As a result, the number of moving bits per current position isThe previous match value is related, and the number of moves in the next position is associated with the currentMatch Value related ... We'll understand., I justyou can get the next table by moving the partial match value from the table to the right one bit.
so now give us a string and we can launch its next array by ourselves. The basic work has been completed with ahalf, and now our task is towrite the request by code The basic method of the next array :
based on the previous understanding, we can understand that the next array can be obtained by recursion:
1. if for value K, there is already p0 P1, ..., pk-1 = Pj-k pj-k+1, ..., pj-1, equivalent to next[j] = k.
that is, the beginning of the string k-1 and the current position J before the k-1 corresponds to equal, then J next backtracking position is this K(because the previous part is the same, and then again it is invalid backtracking)
2. According to the known NEXT[0...J] next[j+1]
1) First, if P[K]==P[J], then next[j+1]=next[j]+1=k+1;
This is a good understanding, that is, as long as the same +1 on the line. For example, C and C are equal, so next[j+1]=2+1=3;



2) if P[k]!=p[j],k index next[k] is equal to P[J], the formula is available next[j+1]=k+1(cannot be used next[j+1]=next[j]+1, because at this point K has changed, soNext[j] is not equal to Kup), if not, 0
the understanding is: if the current character does not match, then need to look for a shorter length prefix suffix, let J back to the corresponding position (e.g.)



Why is it possible to find the same prefix suffix with a shorter length if the recursive prefix index k = next[k]?
This again boils down to the meaning of the next array. We take the prefix p0 pk-1 PK go with suffix pj-k pj-1 PJ match, if PK is mismatched with PJ, the next step is to use p[next[k]] to continue matching with PJ, if p[Next[k]] and PJ or is does not match, you need to find the same prefix suffix with shorter length, i.e. next with p[next[Next[k]] "Go with PJ . with. This process is equivalent to the pattern string self-matching,so the constant recursion k = Next[k],until you find a shorter length .of theThe same prefix suffix, or the same prefix suffix with a shorter length. As shown in the following:



Now let's test whether K backtracking can be found before the same prefix:
because C and D do not match at this time, so K go to next[k] that is k=0, at this time p[0]=p[j], so next[j]=k+1=1. that is, character before E, the string "DABCDABD" has a length of1 The same prefix and suffix
At this point we finally understand the origin of the next array ... can also knock out the code itself, the code is as follows:
void GetNext (char* p,int next[])  {      int plen = strlen (p);      Next[0] =-1;      int k =-1;      int j = 0;      while (J < pLen-1) {          //p[k] represents a prefix, p[j] represents the suffix          if (k = =-1 | | p[j] = = P[k]) {              ++k;              ++j;              NEXT[J] = k;          }          else k = next[k];      }  }  
Of course, at this time the KMP algorithm is the basic version, as well as the optimized version see:
Http://baike.baidu.com/link?url= 7tyifauf53azp6xofec5owivgep5gt9dxnmhy5usg7eyizzebwibvljle1zpsbumu-zqgeh9qqpiqwrdn4lqeq in theoptimization section.

KMP algorithm--from beginner to ignorant to understanding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.