Definition: The primary string s and the sub-string t are set. The sub-string is located to find a sub-string equal to the sub-string T in the primary string S. Generally, the main string S is called the target string, and the substring T is called the pattern string. Therefore, the positioning is also called pattern matching.
Two algorithms are commonly used:
1brute-force Algorithm
Train of Thought: The primary string marks the position of the pointer every time it moves, and then compares it with the child string. How to be equal then returns the position of the current primary string pointer.
Pattern Matching Process
Figure omitted
Disadvantages of the above algorithm: The primary string pointer backtracking. When it matches the pattern string, the primary pointer moves only one position at a time.
2. KMP Algorithm
Idea: analyze the pattern string to avoid unnecessary backtracking. Build mode array.
Process: The pattern string T = "abcabd" is used to store the 'partially Match' information with the next array.
The first character 'a' indicates next [0] =-1;
The second character 'B' indicates next [1] = 0;
The third character 'C', the first character 'B', is not equal to the start of mode t, that is, next [2] = 0;
The fourth character 'a', the front string "BC" and "C", are not matched with the start string of the pattern string T, that is, next [3] = 0;
The fifth character is 'B'. The prefix string "BCA", "BC", and "A" matches the start string of the pattern string T, that is, next [4] = 1;
The sixth character 'D' is the prefix string "bcab", "cab", "AB", and "B". It must start with "AB" and the pattern string T with two equal characters, that is, next [5] = 2;
...
Finally, during calculation, after each matching with the primary string, the primary string continues matching directly from the unmatched pointer. The pattern string jumps to the position pointed to by the next number to avoid backtracking.
Matching Process
Figure omitted
Incomplete process: If S = "aaabaaaab", t = "aaaab" appears, the main string stays at the fourth character three times, the next array of the mode string changes from 3-> 2-> 1-> 0,
Process
Figure omitted
Improvement Method: If S = "aaabaaaab", t = "aaaab" appears, the 1, 2, and 3 characters in the mode are equal to the 4th characters, so you do not need to compare them with the fourth character, however, you can directly compare the characters I = 4, j = 0 when the mode slides to the right to the position of 4th characters.
Process
The final algorithm is as follows:
Public class KMP {private final static int maxsize = 100; Private Static int [] nextval = new int [maxsize]; // next array/* @ function build mode string next array */Private Static void getnext (string s) {int Len = S. length (); char [] CH = S. tochararray (); Int J = 0, K =-1; nextval [0] =-1; while (j <len-1) {If (k =-1 | ch [J] = CH [k]) {J ++; k ++; If (CH [J]! = CH [k]) nextval [J] = K; else nextval [J] = nextval [k];} else {k = nextval [k] ;}} /* @ function calculates the position of the character matching string */Private Static int kmpindex (string S, string t) {int I = 0, j = 0; int SL = S. length (); char [] SC = S. tochararray (); int TL = T. length (); char [] Tc = T. tochararray (); While (I <SL & J <TL) {If (j =-1 | SC [I] = tc [J]) {I ++; j ++;} else {J = nextval [J] ;}} if (j> = TL) Return (I-Tl ); else return-1;} public static void main (string [] ARGs) {string S = "abcaabbabcabaacbacba"; string T = "abcabaa"; getnext (t ); int I = kmpindex (S, T); system. out. println (I );}}
Conclusion: It's easy to understand.
String Pattern Matching Algorithm