KMP algorithm
A common string match
Usually when we write the ordinary string matching algorithm, is holding to match the string to match the matched string, character by comparison, when the discovery of a character mismatch, the pointer of the string to be matched to the previous start to match the next position of the Pointer. Here we call to go to match the string as pattern string p, the string that is matched to the main string s, that is, we take the pattern string p to match the main string s, to see if P is a substring of S.
For example: The main string s is "abcabdsfabcdfrt", the pattern string p is "abcd", when you start the match, you can see that the characters of the string 0, 1, 2 position of S and P are the same, and there is a mismatch to the 3 position, which is based on our previous method, We return the pointer of the main string to the next position at the beginning of the match, that is, to the 1 position of the main string, the character b, and then the 0 position of the pattern string to Match. And so on, each time a mismatch occurs, the pointer of the main string goes back to the next position in the initial matching position, and the pointer to the pattern string returns to the 0 position of the pattern string.
second, Why should we use KMP algorithm
When we have the same string in the pattern string p, For example p= "abcabx", we take this p and the main string s= "abcabqqeeabcabxxxaxxaa" to Match. Starting the match can be seen, the main string s 0, 1, 2, 3, 4 position and the pattern string 0, 1, 2, 3, 4 position characters are the same, and then two strings of the pointer is moved down to the S 5 position of the character and P 5 position of the character will appear it does not match, Based on previous experience, we will trace the pointer of S to 1 and then to the 0 position of P. This is what we found that our P's 0-1 and 3-4 position strings are the same, and that s 0-4 is the same as the 0-4 match of p, so the 3, 4 position of the main string s and the pattern string P 0 and 1 positions are the Same.
If the 1, 2 position of the main string and the pattern string from the beginning of the match, then are all mismatched, the main string s 3, 4 position and the pattern string P 0, 1 position is the same, so we can not backtrack the main string pointer, so that the pattern string 2 position directly with the current position of the main string to Match.
third, KMP mathematical derivation of the algorithm
According to the above situation, we generalize to the general Situation. We use I to denote a pointer to the main string, J for a pointer to the pattern string, and when the first character of the main string and the J character of the pattern string are mismatch, the I character in the main string (pointer I does not backtrack) should be compared to the character in the pattern string. Assuming that our main string matches the pattern string match to the k-character of the pattern string, the first k-1 character of the pattern string must be the same as the first i-k+1 to i-1 characters of the main string, i.e.
P1 P2 ... Pk-1=si-k+1 si-k+2 ... Si-1
And some of the matching results that have been obtained are
Pj-k+1 pj-k+2 ... Pj-1=si-k+1 si-k+2 ... Si-1
Derive from the above two formulas
P1 P2 ... pk-1= pj-k+1 pj-k+2 ... Pj-1
conversely, If there are two substrings in the pattern string that satisfy the above, then when the match process, the I characters in the main string and the J characters in the pattern string are not equal, only the pattern string should be slid right to the K characters in the pattern string and the I-character alignment in the main string (at this time, Because the first k-1 characters in the pattern string and the characters in the i-k to i-1 position correspond to the same, the substring of the first k-1 character in the pattern string P1 P2 ... The Pk-1 must be si-k+1 si-k+2 with a substring of length k-1 before the first character in the main string ... Si-1 is then matched from the K character of the pattern string to the first character of the main string.
four, for each position corresponding k (I.E. Next Array)
We use the next array to access the k-value corresponding to each position in the pattern string, i.e., next[j], which is the position of the characters in the pattern string that are compared to the character in the main string when the corresponding character in the pattern string is Mismatch.
Based on the mathematical derivation of the three, the next function is defined (assuming that the starting position of the string is 1)
When j=1, next[j]=0;
When j!=1, Next[j]=max (k|1<k<j and P1 P2 ... pk-1= pj-k+1 pj-k+2 ... Pj-1) when this collection is not Empty. If this collection is empty, next[j]=1.
example, the next array of the pattern string "abaabcac"
J |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
mode string |
a |
b |
a |
a |
b |
c |
a |
c |
next[j] |
0 |
1 |
1 |
2 |
2 |
3 |
1 |
2 |
Iv. implementation of the Code
public class KMP {
/*
* Kmp function to find the position of the pattern string str2 in the main string str1, the return value is str2 in str1 position,
* Returns-1 If STR2 is not a str1 substring.
*/
Private Static int KMP (String str1,string Str2) {
First step next[j] Array
Char [] strkey = Str2.tochararray ();
int [] next = new int[strkey.length];
Initial
int J1 = 0;
int k =-1;
next[0] =-1;
Guess the first j+1 bit based on the known former j-bit
while (j1 < Strkey.length-1)
{
if (k = =-1 | | strkey[j1] = = Strkey[k])
{
next[++j1] = ++k;
}
Else
{
K = next[k];
}
}
Print our next array.
System. Out. Print (value of "next[]");
for (int i = 0; i < next.length; i++) {
System. Out. Print (next[i]+1+ "");
}
System. Out. println ();
The second step is to match strings based on the evaluated next ARRAY.
int j=0;//j points to the pattern string str2,i to the main string str1.
for (int i = 0; i < str1.length (); I++) {
if (j==str2.length ()) return i-j;
if (str1.charat (i) ==str2.charat (j)) j + +;
Else j=next[j]+1;
}
return -1;
}
Test data
public Static void main (string[] Args) {
String str1= "12345abaabcac2356";
String str2= "abaabcac";
int a=Kmp(str1,str2);
System. Out. println (a);
}
}
Proof and implementation of KMP algorithm