Pattern matching algorithm in KMP strings and from next [] to nextVal []

Source: Internet
Author: User
Tags comparison

Pattern matching algorithm in KMP strings


General matching algorithms:


Introduction of the basic concepts of KMP:


However, we will find that the two matching steps above are unnecessary because their first matching letter is not the same and completely incomparable, when we perform the fourth match, we can see from the pattern string that the matching is the most valuable only when the pattern string slides to this place, we can know from the mode string that the first letter of the last C is a, and the first letter of the second B in the mode string is a, and there is no other, from the first matching result, we can know that the last letter c in the pattern string fails to match B in the primary string (do readers notice that, as we mentioned earlier, the first letter of c is a), and we can know from the pattern string that we can skip the two steps above, do you have to worry about missing matching information in the middle, because it is recorded in our pattern string, none of the preceding letters can match.


Then, when a round of matching fails, how can we determine the sliding position of the mode item, that is, the item in the mode item is aligned with the B (inside the yellow grid) of the main string, so that the comparison items in the middle are omitted. We can set the index of this item to K. For the above pattern item, K is 2. Note that in the string, the first entry in the array is used to record the number of data.

Certificate -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

As mentioned above, the key to the KMP algorithm is to find the K based on the mode string (the conditions that K needs to meet are listed below) (when the I character in the main string does not match the j character in the child string ):


Set the main string to s1s2 ...... The sn substring is p1p2 ...... Pn

Then the value of K must meet the following requirements:



----------------------"


In the final analysis, find the child string with the same pattern string and string header:

Instance:


So: how to find the K of each element in the mode string, that is, the next array shown in the figure:

Code:


Void get_next (SString T, int next [])
{/* Evaluate the value of the next function of mode string T and save it to the array next algorithm 4.7 */
Int I = 1, j = 0;
Next [1] = 0;
While (I <T [0])
If (j = 0 | T [I] = T [j])
     {
++ I;
++ J;
Next [I] = j;
     }
Else
J = next [j];/* when the current surface has a similar comparison, the two general environments of the preceding results are similar, the two general environments have the same matching environment, that is, the recorded environment. If you miss some matching substrings, the K value is smaller than the actual value, that is, the subsequent matching must depend on the previous matching result */
 }


The code above explains:




1: When j = 0, I and j must be added and assigned the next value.

2: T [I] = T [j], I and j must be added and assigned to next

If neither of the above conditions is met, j needs to be directed forward, so the problem arises. Why not directly j = 1, but the value of next [j] needs to be assigned to j, in fact, as I mentioned in the code, I would like to explain it as well.




In fact, we can summarize the following points. K aims to locate the index of the offset. The first few of the misfit items are equal to the header of the pattern item, and K adds 1 to the number, in fact, these numbers are just what we want to skip.

The disadvantage of the KMP algorithm is, in fact, from the process of searching for K, we can see that the KMP algorithm is heavily dependent on the mode string and the main string, there are many parts matching


KMP algorithm from next [] to nextVal []


As mentioned above, we have talked about the next [] record array for the mode string K value. For details, see the article. In fact, the next [] array defined earlier has some defects, I will illustrate one of the following scenarios:


As shown in the preceding figure, if the next [] array obtained by the previous method is used, when the two strings match the preceding figure, the following figure appears:


We found that the steps from step 1 to step 3 are a waste, because they are all compared with the same letter (a) and B, and this computer is also easy to recognize, therefore, the improvement for next [] can be implemented.


The reason is, why do I say that the above three steps are in the white circle? I think this is three consecutive equal a, so we can jump directly from the first step to the fourth step, that is, the obtained array next [j] = k, while the pattern string p [j] = p [k], when s [I] and p [j] in the primary string fail to match, you do not need to compare them with p [k], but directly compare them with p [next [k, of course, you can always iterate forward.


That is:


The code is as follows:

Void get_nextVal (SString T, int nextVal [])
 {
Int I = 1, j = 0;
NextVal [1] = 0;

While (I <= T [0])
     {
If (j = 0 | T [I] = T [j])
         {
J ++;
I ++;
If (nextVal [I] = nextVal [j])
             {
NextVal [I] = nextVal [j];
             }
Else
             {
NextVal [I] = j;
             }

         }
Else
         {
J = nextVal [j];
         }
     }

 }


Note that what you want is always the previous K (for yourself).


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.