Proof and implementation of KMP algorithm

Source: Internet
Author: User

KMP algorithm

A common string match

Usually when we write the ordinary string matching algorithm, is holding to match the string to match the matched string, character by comparison, when the discovery of a character mismatch, the pointer of the string to be matched to the previous start to match the next position of the Pointer. Here we call to go to match the string as pattern string p, the string that is matched to the main string s, that is, we take the pattern string p to match the main string s, to see if P is a substring of S.

For example: The main string s is "abcabdsfabcdfrt", the pattern string p is "abcd", when you start the match, you can see that the characters of the string 0, 1, 2 position of S and P are the same, and there is a mismatch to the 3 position, which is based on our previous method, We return the pointer of the main string to the next position at the beginning of the match, that is, to the 1 position of the main string, the character b, and then the 0 position of the pattern string to Match. And so on, each time a mismatch occurs, the pointer of the main string goes back to the next position in the initial matching position, and the pointer to the pattern string returns to the 0 position of the pattern string.

second, Why should we use KMP algorithm

When we have the same string in the pattern string p, For example p= "abcabx", we take this p and the main string s= "abcabqqeeabcabxxxaxxaa" to Match. Starting the match can be seen, the main string s 0, 1, 2, 3, 4 position and the pattern string 0, 1, 2, 3, 4 position characters are the same, and then two strings of the pointer is moved down to the S 5 position of the character and P 5 position of the character will appear it does not match, Based on previous experience, we will trace the pointer of S to 1 and then to the 0 position of P. This is what we found that our P's 0-1 and 3-4 position strings are the same, and that s 0-4 is the same as the 0-4 match of p, so the 3, 4 position of the main string s and the pattern string P 0 and 1 positions are the Same.

If the 1, 2 position of the main string and the pattern string from the beginning of the match, then are all mismatched, the main string s 3, 4 position and the pattern string P 0, 1 position is the same, so we can not backtrack the main string pointer, so that the pattern string 2 position directly with the current position of the main string to Match.

third, KMP mathematical derivation of the algorithm

According to the above situation, we generalize to the general Situation. We use I to denote a pointer to the main string, J for a pointer to the pattern string, and when the first character of the main string and the J character of the pattern string are mismatch, the I character in the main string (pointer I does not backtrack) should be compared to the character in the pattern string. Assuming that our main string matches the pattern string match to the k-character of the pattern string, the first k-1 character of the pattern string must be the same as the first i-k+1 to i-1 characters of the main string, i.e.

P1 P2 ... Pk-1=si-k+1 si-k+2 ... Si-1

And some of the matching results that have been obtained are

Pj-k+1 pj-k+2 ... Pj-1=si-k+1 si-k+2 ... Si-1

Derive from the above two formulas

P1 P2 ... pk-1= pj-k+1 pj-k+2 ... Pj-1

conversely, If there are two substrings in the pattern string that satisfy the above, then when the match process, the I characters in the main string and the J characters in the pattern string are not equal, only the pattern string should be slid right to the K characters in the pattern string and the I-character alignment in the main string (at this time, Because the first k-1 characters in the pattern string and the characters in the i-k to i-1 position correspond to the same, the substring of the first k-1 character in the pattern string P1 P2 ... The Pk-1 must be si-k+1 si-k+2 with a substring of length k-1 before the first character in the main string ... Si-1 is then matched from the K character of the pattern string to the first character of the main string.

four, for each position corresponding k (I.E. Next Array)

We use the next array to access the k-value corresponding to each position in the pattern string, i.e., next[j], which is the position of the characters in the pattern string that are compared to the character in the main string when the corresponding character in the pattern string is Mismatch.

Based on the mathematical derivation of the three, the next function is defined (assuming that the starting position of the string is 1)

When j=1, next[j]=0;

When j!=1, Next[j]=max (k|1<k<j and P1 P2 ... pk-1= pj-k+1 pj-k+2 ... Pj-1) when this collection is not Empty. If this collection is empty, next[j]=1.

example, the next array of the pattern string "abaabcac"

J

1

2

3

4

5

6

7

8

mode string

a

b

a

a

b

c

a

c

next[j]

0

1

1

2

2

3

1

2

Iv. implementation of the Code

public class KMP {

/*

* Kmp function to find the position of the pattern string str2 in the main string str1, the return value is str2 in str1 position,

* Returns-1 If STR2 is not a str1 substring.

*/

Private Static int KMP (String str1,string Str2) {

First step next[j] Array

Char [] strkey = Str2.tochararray ();

int [] next = new int[strkey.length];

Initial

int J1 = 0;

int k =-1;

next[0] =-1;

Guess the first j+1 bit based on the known former j-bit

while (j1 < Strkey.length-1)

{

if (k = =-1 | | strkey[j1] = = Strkey[k])

{

next[++j1] = ++k;

}

Else

{

K = next[k];

}

}

Print our next array.

System. Out. Print (value of "next[]");

for (int i = 0; i < next.length; i++) {

System. Out. Print (next[i]+1+ "");

}

System. Out. println ();

The second step is to match strings based on the evaluated next ARRAY.

int j=0;//j points to the pattern string str2,i to the main string str1.

for (int i = 0; i < str1.length (); I++) {

if (j==str2.length ()) return i-j;

if (str1.charat (i) ==str2.charat (j)) j + +;

Else j=next[j]+1;

}

return -1;

}

Test data

public Static void main (string[] Args) {

String str1= "12345abaabcac2356";

String str2= "abaabcac";

int a=Kmp(str1,str2);

System. Out. println (a);

}

}

Proof and implementation of KMP algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.