Improved string mode matching algorithm (KMP algorithm)

Source: Internet
Author: User

The KMP algorithm is a string matching algorithm. The efficiency of this algorithm is that when the matching fails at a certain position, you can start from another position of the pattern string based on the previous matching results, instead of matching strings from the beginning.
Therefore, the key of this algorithm is to start a new comparison from the position of the pattern string when the matching at a certain position is unsuccessful. assume that this value is stored in a next array, where the elements in the next array meet this condition: Next [J] = K, indicates that the matching fails when the J + 1 (the array element in the Standard C language starts from 0, which is not described below) in the pattern string, A New Match should start from the k + 1 character of the pattern string. if the next array of the pattern string is obtained, the implementation of the KMP algorithm is as follows:


// KMP string mode matching algorithm <br/> // input: S is the main string, T is the mode string, and POS is the start position in S <br/> // output: if the match succeeds, the start position is returned. Otherwise,-1 <br/> int KMP (pstring S, pstring T, int POS) is returned. <br/>{< br/> assert (null! = S); <br/> assert (null! = T); <br/> assert (Pos> = 0); <br/> assert (Pos <s-> length ); </P> <p> If (S-> length <t-> length) <br/> return-1; </P> <p> printf ("Main string/T = % s/n", S-> Str ); <br/> printf ("mode string/T = % s/n", T-> Str); </P> <p> int * Next = (int *) malloc (t-> length * sizeof (INT); <br/> // obtain the next array of the mode string <br/> getnextarray (T, next ); </P> <p> int I, j; <br/> for (I = POs, j = 0; I <s-> length & J <t-> length;) <br/> {<br/> // I is the master string cursor, J is the mode string cursor <br/> If (-1 = j | // The mode string cursor has been rolled back to the first position <br/> S-> STR [I] = T-> STR [J]) // The current character is matched successfully <br/>{< br/> // when both of the preceding conditions are met, both the cursor must go forward. <br/> ++ I; <br/> + J; <br/>}< br/> else // The matching fails, the mode string cursor goes back to the next value of the current character <br/>{< br/> J = next [J]; <br/>}</P> <p> free (next); </P> <p> If (j> = T-> length) <br/>{< br/> // matched <br/> return I-t-> length; <br/>}< br/> else <br/> {<br/> // mismatch failed <br/> return-1; <br/>}< br/>}

Next let's take a look at how to get the next array.
This is a recursive process. The initial condition is next [0] =-1.
Assume that the following equation is true at a certain time: Str [0... k-1] = STR [J-k... j-1], then next [J] = K. On this premise, continue matching the next character.
1) if STR [0... k] = STR [J-k... j], so next [J + 1] = next [J] + 1 = k + 1.
2) Otherwise, if the match above is not true, a new match will be performed starting from next [K]. If the match succeeds, then:
Next [J + 1] = next [next [J] + 1 = next [k] + 1;
If the match still fails, a new match starts from next [next [k] until the match is successful. if no matching character can be found during this process, next [J + 1] = 0, in this case, the match starts from the first position of the string.
If you use a formula to represent the above algorithm, you can write:
Next [J] =
1)-1, when J = 0;
2) max {k | 0 <= k <J & STR [0 .. k-1] = STR [J-k .. J-1]};
3) 0. In other cases, the match starts from the first position.
The algorithm for finding the next array is as follows:

 

// Obtain the next array of the string <br/> void getnextarray (pstring pstr, int next []) <br/>{< br/> assert (null! = Pstr); <br/> assert (null! = NEXT); <br/> assert (pstr-> length> 0); </P> <p> // The next value of the first character is-1, because the array in C starts from 0 <br/> next [0] =-1; <br/> for (INT I = 0, j =-1; I <pstr-> length-1;) <br/>{< br/> // I is the cursor of the Main string, J is the cursor of the mode string <br/> // both the master string and the mode string are the same string <br/> If (-1 = j | // If the Mode string cursor has been rolled back to the first character <br/> pstr-> STR [I] = pstr-> STR [J]) // If the match is successful <br/>{< br/> // both the cursors take a step forward <br/> + + I; <br/> + J; <br/> // store the current next value as the cursor value of the current mode string <br/> next [I] = J; <br/>}< br/> else // If the matching fails, J is rolled back to the previous next value. <br/>{< br/> J = next [J]; <br/>}< br/>

 

The complete algorithm is as follows:

 

/**//*********************************** ******************************** <Br/> created: 2006/07/02 <br/> filename: KMP. CPP <br/> author: Li Chuang </P> <p> http://www.cppblog.com/converse/ </P> <p> reference: yan Weimin <Data Structure> </P> <p> purpose: demonstration of KMP string matching algorithm <br/> ****************************** ***************************************/ </P> <p> # include <stdio. h> <br/> # include <stdlib. h> <br/> # include <assert. h> <br/> # Include <string. h> </P> <p> # define max_len_of_str 30 // Maximum length of the string </P> <p> typedef struct string // The String Array required here, string and its length <br/>{< br/> char STR [max_len_of_str]; // character array <br/> int length; // the actual length of the string <br/>} string, * pstring; </P> <p> // obtain the next array of the string <br/> void getnextarray (pstring pstr, int next []) <br/>{< br/> assert (null! = Pstr); <br/> assert (null! = NEXT); <br/> assert (pstr-> length> 0); </P> <p> // The next value of the first character is-1, because the array in C starts from 0 <br/> next [0] =-1; <br/> for (INT I = 0, j =-1; I <pstr-> length-1;) <br/>{< br/> // I is the cursor of the Main string, J is the cursor of the mode string <br/> // both the master string and the mode string are the same string <br/> If (-1 = j | // If the Mode string cursor has been rolled back to the first character <br/> pstr-> STR [I] = pstr-> STR [J]) // If the match is successful <br/>{< br/> // both the cursors take a step forward <br/> + + I; <br/> + J; <br/> // stores the current next value as the current mode string Cursor value <br/> next [I] = J; <br/>}< br/> else // If the matching fails, J is rolled back to the previous next value. <br/>{< br/> J = next [J]; <br/>}</P> <p> // KMP string mode matching algorithm <br/> // input: S is the main string, T is the mode string, and POS is the starting position in S <br/> // output: If the matching succeeds, return the starting position, otherwise,-1 <br/> int KMP (pstring S, pstring T, int POS) is returned. <br/>{< br/> assert (null! = S); <br/> assert (null! = T); <br/> assert (Pos> = 0); <br/> assert (Pos <s-> length ); </P> <p> If (S-> length <t-> length) <br/> return-1; </P> <p> printf ("Main string/T = % s/n", S-> Str ); <br/> printf ("mode string/T = % s/n", T-> Str); </P> <p> int * Next = (int *) malloc (t-> length * sizeof (INT); <br/> // obtain the next array of the mode string <br/> getnextarray (T, next ); </P> <p> int I, j; <br/> for (I = POs, j = 0; I <s-> length & J <t-> length;) <br/> {<br/> // I is the master string cursor, J is the mode string cursor <br/> If (-1 = j | // The mode string cursor has been rolled back to the first position <br/> S-> STR [I] = T-> STR [J]) // The current character is matched successfully <br/>{< br/> // when both of the preceding conditions are met, both the cursor must go forward. <br/> ++ I; <br/> + J; <br/>}< br/> else // The matching fails, the mode string cursor goes back to the next value of the current character <br/>{< br/> J = next [J]; <br/>}</P> <p> free (next); </P> <p> If (j> = T-> length) <br/>{< br/> // matched <br/> return I-t-> length; <br/>}< br/> else <br/> {<br/> // mismatch failed <br/> return-1; <br/>}< br/>}

 

Original article: http://www.cppblog.com/converse/archive/2006/07/05/9447.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.