On the KMP of data structure (pattern matching algorithm in string)

Source: Internet
Author: User

The KMP algorithm is an improved version of the pattern matching algorithm, which reduces the number of matches by reducing the number of matching times and making the main string not return, thus less the corresponding cost of the algorithm, however, the event is universal, and the validity of the KMP algorithm has certain limitations, I will also discuss the limitations of this algorithm at the end of this article.

General matching algorithm:

KMP Basic Concepts introduced:

However, in fact, we will find that the middle of the above two matching steps is not necessary, because their first match letter is not the same, there is no comparability, and when we match in the fourth time, in fact, we can learn from the pattern string, only when the pattern string sliding to this place, its matching is the most valuable, Because we can tell from the pattern string that the last letter of C is a, and the first letter of the second letter B in the pattern string is a, and no other, the result of matching from the first step we can tell that the last letter C in the pattern string fails with the B match in the main string (Readers notice that As we mentioned earlier, the first letter of this c is a OH), and from the pattern string we can tell that we can completely skip the middle two steps of the matching step above, whether the reader in the middle of the worry will miss the original can match, do not have to worry, because in our pattern string is recorded, There is not even a letter that can match a.
then, when a match fails, how the sliding position of the pattern item is determined, that is, the item in the pattern is aligned with the B (yellow lattice) of the main string, thus omitting the intermediate comparison, we can set the index of this item to K, as the above pattern item, K to 2 note in the string, The first item of an array is used to record the number of data.

--------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------

As mentioned above, the key to the KMP algorithm is to find that k based on the pattern string (the conditions required for K to be met below) (when the first character in the main string is mismatch with the J character in the substring):

Set the main string as: s1s2......sn substring: P1P2......PN

The value of K should satisfy:

----------------------》》》》》》》》》》》》》》》

In the final analysis, find the substring of the pattern string and the string repeat:

Instance:

So: How to find the K of each element in the pattern string, that is, the next array in:

To the code:

voidGet_next (sstring T,intnext[]) { /*find the next function value of the pattern string T and deposit to the array next algorithm 4.7*/   intI=1, j=0; next[1]=0;  while(i<t[0])     if(j==0|| t[i]==T[j]) {       ++i; ++J; Next[i]=J; }     ElseJ=NEXT[J];/*The essence of the place when there is already a similar comparison, directly borrow the previous results two general environment similar, then the two general environment will have the same matching environment, that is, the environment has been recorded, if missing some have matched the substring, it will lead to K value than the actual smaller, That is, subsequent matches must depend on the previous matching result*/ }

The above code explains:

1:j=0, I and J are added and assigned to next

2:t[i] = T[j], I and J are added and assigned to next

If the above two are not satisfied, you need to point J forward, then the question is, why not directly j = 1, and need to assign the value of Next[j] to J, in fact, as I said in the code, may also give a chestnut explanation.

In fact, it can be summed up as a few points, the purpose of K is to locate the index of the offset, the mismatch in front of a few and the pattern of the head of the item is equal, K for these numbers plus 1, and actually these several numbers, exactly what we want to skip.

KMP the disadvantage of the algorithm, in fact, from the process of looking for k, we can see that the KMP algorithm heavy dependency pattern column has a duplicate substring, otherwise ~~~~~~

On the KMP of data structure (pattern matching algorithm in string)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.